Resumen
Backfat thickness (BF) is closely related to the service life and reproductive performance of sows. The dynamic monitoring of sows? BF is a critical part of the production process in large-scale pig farms. This study proposed the application of a hybrid CNN-ViT (Vision Transformer, ViT) model for measuring sows? BF to address the problems of high measurement intensity caused by the traditional contact measurement of sows? BF and the low efficiency of existing non-contact models for measuring sows? BF. The CNN-ViT introduced depth-separable convolution and lightweight self-attention, mainly consisting of a Pre-local Unit (PLU), a Lightweight ViT (LViT) and an Inverted Residual Unit (IRU). This model could extract local and global features of images, making it more suitable for small datasets. The model was tested on 106 pregnant sows with seven randomly divided datasets. The results showed that the CNN-ViT had a Mean Absolute Error (MAE) of 0.83 mm, a Root Mean Square Error (RMSE) of 1.05 mm, a Mean Absolute Percentage Error (MAPE) of 4.87% and a coefficient of determination (R-Square, R2) of 0.74. Compared to LviT-IRU, PLU-IRU and PLU-LviT, the CNN-ViT?s MAE decreased by more than 12%, RMSE decreased by more than 15%, MAPE decreased by more than 15% and R² improved by more than 17%. Compared to the Resnet50 and ViT, the CNN-ViT?s MAE decreased by more than 7%, RMSE decreased by more than 13%, MAPE decreased by more than 7% and R2 improved by more than 15%. The method could better meet the demand for the non-contact automatic measurement of pregnant sows? BF in actual production and provide technical support for the intelligent management of pregnant sows.