Welcome to Smart Agriculture 中文
30 September 2024, Volume 6 Issue 5
Overview Article
Research Progress and Prospects of Key Navigation Technologies for Facility Agricultural Robots | Open Access
HE Yong, HUANG Zhenyu, YANG Ningyuan, LI Xiyao, WANG Yuwei, FENG Xuping
2024, 6(5):  1-19.  doi:10.12133/j.smartag.SA202404006
Asbtract ( 460 )   HTML ( 128)   PDF (2130KB) ( 255 )  
Figures and Tables | References | Related Articles | Metrics

[Significance] With the rapid development of robotics technology and the persistently rise of labor costs, the application of robots in facility agriculture is becoming increasingly widespread. These robots can enhance operational efficiency, reduce labor costs, and minimize human errors. However, the complexity and diversity of facility environments, including varying crop layouts and lighting conditions, impose higher demands on robot navigation. Therefore, achieving stable, accurate, and rapid navigation for robots has become a key issue. Advanced sensor technologies and algorithms have been proposed to enhance robots' adaptability and decision-making capabilities in dynamic environments. This not only elevates the automation level of agricultural production but also contributes to more intelligent agricultural management. [Progress] This paper reviews the key technologies of automatic navigation for facility agricultural robots. It details beacon localization, inertial positioning, simultaneous localization and mapping (SLAM) techniques, and sensor fusion methods used in autonomous localization and mapping. Depending on the type of sensors employed, SLAM technology could be subdivided into vision-based, laser-based and fusion systems. Fusion localization is further categorized into data-level, feature-level, and decision-level based on the types and stages of the fused information. The application of SLAM technology and fusion localization in facility agriculture has been increasingly common. Global path planning plays a crucial role in enhancing the operational efficiency and safety of facility aricultural robots. This paper discusses global path planning, classifying it into point-to-point local path planning and global traversal path planning. Furthermore, based on the number of optimization objectives, it was divided into single-objective path planning and multi-objective path planning. In regard to automatic obstacle avoidance technology for robots, the paper discusses sevelral commonly used obstacle avoidance control algorithms commonly used in facility agriculture, including artificial potential field, dynamic window approach and deep learning method. Among them, deep learning methods are often employed for perception and decision-making in obstacle avoidance scenarios. [Conclusions and Prospects] Currently, the challenges for facility agricultural robot navigation include complex scenarios with significant occlusions, cost constraints, low operational efficiency and the lack of standardized platforms and public datasets. These issues not only affect the practical application effectiveness of robots but also constrain the further advancement of the industry. To address these challenges, future research can focus on developing multi-sensor fusion technologies, applying and optimizing advanced algorithms, investigating and implementing multi-robot collaborative operations and establishing standardized and shared data platforms.

Orchard-Wide Visual Perception and Autonomous Operation of Fruit Picking Robots: A Review | Open Access
CHEN Mingyou, LUO Lufeng, LIU Wei, WEI Huiling, WANG Jinhai, LU Qinghua, LUO Shaoming
2024, 6(5):  20-39.  doi:10.12133/j.smartag.SA202405022
Asbtract ( 236 )   HTML ( 59)   PDF (4030KB) ( 303 )  
Figures and Tables | References | Related Articles | Metrics

[Significance] Fruit-picking robot stands as a crucial solution for achieving intelligent fruit harvesting. Significant progress has been made in developing foundational methods for picking robots, such as fruit recognition, orchard navigation, path planning for picking, and robotic arm control, the practical implementation of a seamless picking system that integrates sensing, movement, and picking capabilities still encounters substantial technical hurdles. In contrast to current picking systems, the next generation of fruit-picking robots aims to replicate the autonomous skills exhibited by human fruit pickers. This involves effectively performing ongoing tasks of perception, movement, and picking without human intervention. To tackle this challenge, this review delves into the latest research methodologies and real-world applications in this field, critically assesses the strengths and limitations of existing methods and categorizes the essential components of continuous operation into three sub-modules: local target recognition, global mapping, and operation planning. [Progress] Initially, the review explores methods for recognizing nearby fruit and obstacle targets. These methods encompass four main approaches: low-level feature fusion, high-level feature learning, RGB-D information fusion, and multi-view information fusion, respectively. Each of these approaches incorporates advanced algorithms and sensor technologies for cluttered orchard environments. For example, low-level feature fusion utilizes basic attributes such as color, shapes and texture to distinguish fruits from backgrounds, while high-level feature learning employs more complex models like convolutional neural networks to interpret the contextual relationships within the data. RGB-D information fusion brings depth perception into the mix, allowing robots to gauge the distance to each fruit accurately. Multi-view information fusion tackles the issue of occlusions by combining data from multiple cameras and sensors around the robot, providing a more comprehensive view of the environment and enabling more reliable sensing. Subsequently, the review shifts focus to orchard mapping and scene comprehension on a broader scale. It points out that current mapping methods, while effective, still struggle with dynamic changes in the orchard, such as variations of fruits and light conditions. Improved adaptation techniques, possibly through machine learning models that can learn and adjust to different environmental conditions, are suggested as a way forward. Building upon the foundation of local and global perception, the review investigates strategies for planning and controlling autonomous behaviors. This includes not only the latest advancements in devising movement paths for robot mobility but also adaptive strategies that allow robots to react to unexpected obstacles or changes within the whole environment. Enhanced strategies for effective fruit picking using the Eye-in-Hand system involve the development of more dexterous robotic hands and improved algorithms for precisely predicting the optimal picking point of each fruit. The review also identifies a crucial need for further advancements in the dynamic behavior and autonomy of these technologies, emphasizing the importance of continuous learning and adaptive control systems to improve operational efficiency in diverse orchard environments. [Conclusions and Prospects] The review underscores the critical importance of coordinating perception, movement, and picking modules to facilitate the transition from a basic functional prototype to a practical machine. Moreover, it emphasizes the necessity of enhancing the robustness and stability of core algorithms governing perception, planning, and control, while ensuring their seamless coordination which is a central challenge that emerges. Additionally, the review raises unresolved questions regarding the application of picking robots and outlines future trends, include deeper integration of stereo vision and deep learning, enhanced global vision sampling, and the establishment of standardized evaluation criteria for overall operational performance. The paper can provide references for the eventual development of robust, autonomous, and commercially viable picking robots in the future.

Technology and Method
Reconstruction of U.S. Regional-Scale Soybean SIF Based on MODIS Data and BP Neural Network | Open Access
YAO Jianen, LIU Haiqiu, YANG Man, FENG Jinying, CHEN Xiu, ZHANG Peipei
2024, 6(5):  40-50.  doi:10.12133/j.smartag.SA202309006
Asbtract ( 153 )   HTML ( 17)   PDF (1648KB) ( 194 )  
Figures and Tables | References | Related Articles | Metrics

[Objective] Sunlight-induced chlorophyll fluorescence (SIF) data obtained from satellites suffer from issues such as low spatial and temporal resolution, and discrete footprint because of the limitations imposed by satellite orbits. To address these problems, obtaining higher resolution SIF data, most reconstruction studies are based on low-resolution satellite SIF. Moreover, the spatial resolution of most SIF reconstruction products is still not enough to be directly used for the study of crop photosynthetic rate at the regional scale. Although some SIF products boast elevated resolutions, but these derive not from the original satellite SIF data reconstruct but instead evolve from secondary reconstructions based on preexisting SIF reconstruction products. Satellite OCO-2 (The Orbiting Carbon Obsevatory-2) equipped with a high-resolution spectrometer, OCO-2 SIF has higher spatial resolution (1.29×2.25 km) compared to other original SIF products, making it suitable in advancing the realm of high-resolution SIF data reconstruction, particularly within the context of regional-scale crop studies. [Methods] This research primarily exploration SIF reconstruct at the regional scale, mainly focused on the partial soybean planting regions nestled within the United States. The selection of MODIS raw data hinged on a meticulous consideration of environmental conditions, the distinctive physiological attributes of soybeans, and an exhaustive evaluation of factors intricately linked to OCO-2 SIF within these soybean planting regions. The primary tasks of this research encompassed reconstructing high resolution soybean SIF while concurrently executing a rigorous assessment of the reconstructed SIF's quality. During the dataset construction process, amalgamated SIF data from multiple soybean planting regions traversed by the OCO-2 satellite's footprint to retain as many of the available original SIF samples as possible. This approach provided the subsequent SIF reconstruction model with a rich source of SIF data. SIF data obtained beneath the satellite's trajectory were matched with various MODIS datasets, including enhanced vegetation index (EVI), fraction of photosynthetically active radiation (FPAR), and land surface temperature (LST), resulting in the creation of a multisource remote sensing dataset ultimately used for model training. Because of the multisource remote sensing dataset encompassed the most relevant explanatory variables within each SIF footprint coverage area concerning soybean physiological structure and environmental conditions. Through the activation functions in the BP neural network, it enhanced the understanding of the complex nonlinear relationships between the original SIF data and these MODIS products. Leveraging these inherent nonlinear relationships, compared and analyzed the effects of different combinations of explanatory variables on SIF reconstruction, mainly analyzing the three indicators of goodness of fit R2, root mean square error RMSE, and mean absolute error MAE, and then selecting the best SIF reconstruction model, generate a regional scale, spatially continuous, and high temporal resolution (500 m, 8 d) soybean SIF reconstruction dataset (BPSIF). [Results and Discussions] The research findings confirmed the strong performance of the SIF reconstruction model in predicting soybean SIF. After simultaneously incorporating EVI, FPAR, and LST as explanatory variables to model, achieved a goodness of fit with an R2 value of 0.84, this statistical metric validated the model's capability in predicting SIF data, it also reflected that the reconstructed 8 d time resolution of SIF data's reliability of applying to small-scale agricultural crop photosynthesis research with 500 m×500 m spatial scale. Based on this optimal model, generated the reconstructed SIF product (BPSIF). The Pearson correlation coefficient between the original OCO-2 SIF data and MODIS GPP stood were at a modest 0.53. In stark contrast, the correlation coefficient between BPSIF and MODIS Gross Primary Productivity (GPP) rosed significantly to 0.80. The increased correlation suggests that BPSIF could more accurately reflect the dynamic changes in GPP during the soybean growing season, making it more reliable compared to the original SIF data. Selected soybean planting areas in the United States with relatively single crop cultivation as the research area, based on high spatial resolution (1.29 km×2.25 km) OCO-2 SIF data, greatly reduced vegetation heterogeneity under a single SIF footprint. [Conclusions] The BPSIF proposed has significantly enhancing the regional and temporal continuity of OCO-2 SIF while preserving the time and spatial attributes contained in the original SIF dataset. Within the study area, BPSIF exhibits a significantly improved correlation with MODIS GPP compared to the original OCO-2 SIF. The proposed OCO-2 SIF data reconstruction method in this study holds the potential to provide a more reliable SIF dataset. This dataset has the potential to drive further understanding of soybean SIF at finer spatial and temporal scales, as well as find its relationship with soybean GPP.

Suitable Sowing Date Method of Winter Wheat at the County Level Based on ECMWF Long-Term Reanalysis Data | Open Access
LIU Ruixuan, ZHANG Fangzhao, ZHANG Jibo, LI Zhenhai, YANG Juntao
2024, 6(5):  51-60.  doi:10.12133/j.smartag.SA202309019
Asbtract ( 148 )   HTML ( 6)   PDF (1500KB) ( 184 )  
Figures and Tables | References | Related Articles | Metrics

[Objective] Acurately determining the suitable sowing date for winter wheat is of great significance for improving wheat yield and ensuring national food security. Traditional visual interpretation method is not only time-consuming and labor-intensive, but also covers a relatively small area. Remote sensing monitoring, belongs to post-event monitoring, exhibits a time lag. The aim of this research is to use the temperature threshold method and accumulated thermal time requirements for wheat leaves appearance method to analyze the suitable sowing date for winter wheat in county-level towns under the influence of long-term sequence of climate warming. [Methods] The research area were various townships in Qihe county, Shandong province. Based on European centre for medium-range weather forecasts (ECMWF) reanalysis data from 1997 to 2022, 16 meteorological data grid points in Qihe county were selected. Firstly, the bilinear interpolation method was used to interpolate the temperature data of grid points into the approximate center points of each township in Qihe county, and the daily average temperatures for each township were obtained. Then, temperature threshold method was used to determine the final dates of stable passage through 18, 16, 14 and 0 ℃. Key sowing date indicators such as suitable sowing temperature for different wheat varieties, growing degree days (GDD)≥0 ℃ from different sowing dates to before overwintering, and daily average temperature over the years were used for statistical analysis of the suitable sowing date for winter wheat. Secondly, the accumulated thermal time requirements for wheat leaves appearance method was used to calculate the appropriate date of GDD for strong seedlings before winter by moving forward from the stable date of dropping to 0 ℃. Accumulating the daily average temperatures above 0 ℃ to the date when the GDD above 0 ℃ was required for the formation of strong seedlings of wheat, a range of ±3 days around this calculated date was considered the theoretical suitable sowing date. Finally, combined with actual production practices, the appropriate sowing date of winter wheat in various townships of Qihe county was determined under the trend of climate warming. [Results and Discussions] The results showed that, from November 1997 to early December 2022, winter and annual average temperatures in Qihe county had all shown an upward trend, and there was indeed a clear trend of climate warming in various townships of Qihe county. Judging from the daily average temperature over the years, the temperature fluctuation range in November was the largest in a year, with a maximum standard deviation was 2.61 ℃. This suggested a higher likelihood of extreme weather conditions in November. Therefore, it was necessary to take corresponding measures to prevent and reduce disasters in advance to avoid affecting the growth and development of wheat. In extreme weather conditions, it was limited to determine the sowing date only by temperature or GDD. In cold winter years, it was too one-sided to consider only from the perspective of GDD. It was necessary to expand the range of GDD required for winter wheat before overwintering based on temperature changes to ensure the normal growth and development of winter wheat. The suitable sowing date for semi winter wheat obtained by temperature threshold method was from October 4th to October 16th, and the suitable sowing date for winter wheat was from September 27th to October 4th. Taking into account the GDD required for the formation of strong seedlings before winter, the suitable sowing date for winter wheat was from October 3rd to October 13th, and the suitable sowing date for semi winter wheat was from October 15th to October 24th, which was consisted with the suitable sowing date for winter wheat determined by the accumulated thermal time requirements for wheat leaves appearance method. Considering the winter wheat varieties planted in Qihe county, the optimal sowing date for winter wheat in Qihe county was from October 3rd to October 16th, and the optimal sowing date was from October 5th to October 13th. With the gradual warming of the climate, the suitable sowing date for wheat in various townships of Qihe county in 2022 was later than that in 2002. However, the sowing date for winter wheat was still influenced by factors such as soil moisture, topography, and seeding quality. The suitable sowing date for a specific year still needed to be adjusted to local conditions and flexibly sown based on the specific situation of that year. [Conclusions] The experimental results proved the feasibility of the temperature threshold method and accumulated thermal time requirements for wheat leaves appearance method in determining the suitable sowing date for winter wheat. The temperature trend can be used to identify cold or warm winters, and the sowing date can be adjusted in a timely manner to enhance wheat yield and reduce the impact of excessively high or low temperatures on winter wheat. The research results can not only provide decision-making reference for winter wheat yield assessment in Qihe county, but also provide an important theoretical basis for scientifically arrangement of agricultural production.

Prediction and Mapping of Soil Total Nitrogen Using GF-5 Image Based on Machine Learning Optimization Modeling | Open Access
LIU Liqi, WEI Guangyuan, ZHOU Ping
2024, 6(5):  61-73.  doi:10.12133/j.smartag.SA202405011
Asbtract ( 101 )   HTML ( 18)   PDF (3325KB) ( 64 )  
Figures and Tables | References | Related Articles | Metrics

[Objective] Nitrogen in soil is an absolutely crucial element for plant growth. Insufficient nitrogen supply can severely affect crop yield and quality, while excessive use of nitrogen fertilizers can lead to significant environmental issues such as water eutrophication and groundwater pollution. Therefore, large-scale, rapid detection of soil nitrogen content and precise fertilization are of great importance for smart agriculture. In this study, the hyperspectral data from the GF-5 satellite was emploied, and the various machine learning algorithms introduced to establish a prediction model for soil total nitrogen (TN) content and a distribution map of soil TN content was generated in the study area, aiming to provide scientific evidence for intelligent monitoring in smart agriculture. [Method] The study area was the Jian Sanjiang Reclamation Area in Fujin city, Heilongjiang province. Fieldwork involved the careful collection of 171 soil samples, obtaining soil spectral data, chemical analysis data of soil TN content, and the GF-5 hyperspectral data. Among these samples, 140 were randomly selected as the modeling sample set for calibration, and the remaining 31 samples were used as the test sample set. Three machine learning algorithms were introduced: Partial least squares regression (PLSR), backpropagation neural network (BPNN), and support vector machine (SVM) driven by a polynomial kernel function (Poly). Three distinct soil TN inversion models were constructed using these algorithms. To optimize model performance, ten-fold cross-validation was employed to determine the optimal parameters for each model. Additionally, multiple scatter correction (MSC) was applied to obtain band characteristic values, thus enhancing the model's prediction capability. Model performance was evaluated using three indicators: Coefficient of determination (R²), root mean square error (RMSE), and relative prediction deviation (RPD), to assess the prediction accuracy of different models. [Results and Discussions] MSC-Poly-SVM model exhibited the best prediction performance on the test sample set, with an R² of 0.863, an RMSE of 0.203, and an RPD of 2.147. This model was used to perform soil TN content inversion mapping using GF-5 satellite hyperspectral data. In accordance with the stringent requirements of land quality geochemical evaluation, the GF-5 hyperspectral land organic nitrogen parameter distribution map was drawn based on the "Determination of Land Quality Geochemical Evaluation". The results revealed that 86.1% of the land in the Jian Sanjiang study area had a total nitrogen content of more than 2.0 g/kg, primarily concentrated in first and second-grade plots, while third and fourth-grade plots accounted for only 11.83% of the total area. The study area exhibited sufficient soil nitrogen reserves, with high TN background values mainly concentrated along the riverbanks in the central part, distributed in a northeast-east direction. Specifically, in terms of soil spectral preprocessing, the median filtering method performed best in terms of smoothness and maintaining spectral characteristics. The spectra extracted from GF-5 imagery were generally quite similar to ground-measured spectral data, despite some noise, which had a minimal overall impact. [Conclusions] This study demonstrates the clear feasibility of using GF-5 satellite hyperspectral remote sensing data and machine learning algorithm for large-scale quantitative detection and visualization analysis of soil TN content. The soil TN content distribution map generated based on GF-5 hyperspectral remote sensing data is detailed and consistent with results from other methods, providing technical support for future large-scale quantitative detection of soil nutrient status and rational fertilization.

ReluformerN: Lightweight High-Low Frequency Enhanced for Hyperspectral Agricultural Lancover Classification | Open Access
LIU Yi, ZHANG Yanjun
2024, 6(5):  74-87.  doi:10.12133/j.smartag.SA202406008
Asbtract ( 77 )   HTML ( 5)   PDF (3072KB) ( 76 )  
Figures and Tables | References | Related Articles | Metrics

[Objective] In order to intelligently monitor the distribution of agricultural land cover types, high-spectral cameras are usually mounted on drones to collect high-spectral data, followed by classification of the high-spectral data to automatically draw crop distribution maps. Different crops have similar shapes, and the same crop has significant differences in different growth stages, so the network model for agricultural land cover classification requires a high degree of accuracy. However, network models with high classification accuracy are often complex and cannot be deployed on hardware systems. In view of this problem, a lightweight high-low frequency enhanced Reluformer network (ReluformerN) was proposed in this research. [Methods] Firstly, an adaptive octave convolution was proposed, which utilized the softmax function to automatically adjust the spectral dimensions of high-frequency features and low-frequency features, effectively alleviating the influence of manually setting the spectral dimensions and benefiting the subsequent extraction of spatial and spectral domain features of hyperspectral images. Secondly, a Reluformer was proposed to extract global features, taking advantage of the fact that low-frequency information could capture global features. Reluformer replaced the softmax function with a function of quadratic computational complexity, and through theoretical and graphical analysised, Relu function, LeakRelu function, and Gelu function were compared, it was found that the ReLU function and the softmax function both had non-negativity, which could be used for feature relevance analysis. Meanwhile, the ReLU function has a linearization feature, which is more suitable for self-relevance analysis. Therefore, the ReLU self-attention mechanism was proposed, which used the ReLU function to perform feature self-attention analysis. In order to extract deep global features, multi-scale feature fusion was used, and the ReLU self-attention mechanism was used as the core to construct the multi-head ReLU self-attention mechanism. Similar to the transformer architecture, the Reluformer structure was built by combining multi-head ReLU self-attention mechanism, feedforward layers, and normalization layers. With Reluformer as the core, the Reluformer network (ReluformerN) was proposed. This network considered frequency from the perspective of high-frequency information, taking into account the local features of image high-frequency information, and used deep separable convolution to design a lightweight network for fine-grained feature extraction of high-frequency information. It proposed Reluformer to extract global features for low-frequency information, which represented the global features of the image. ReluformerN was experimented on three public high-spectral data sets (Indian Pines, WHU-Hi-LongKou and Salinas) for crop variety fine classification, and was compared with five popular classification networks (2D-CNN, HybirdSN, ViT, CTN and LSGA-VIT). [Results and Discussion] ReluformerN performed best in overall accuracy (OA), average accuracy (AA), and other accuracy evaluation indicators. In the evaluation indicators of model parameters, model computation (FLOPs), and model complexity, ReluformerN had the smallest number of parameters and was less than 0.3 M, and the lowest computation. In the visualization comparison, the classification effect diagram of the model using ReluformerN had clearer image edges and more complete morphological structures, with fewer classification errors. The validity of the adaptive octave convolution was verified by comparing it with the traditional eightfold convolution. The classification accuracy of the adaptive octave convolution was 0.1% higher than that of the traditional octave convolution. When the artificial parameters were set to different values, the maximum and minimum classification accuracies of the traditional octave convolution were about 0.3% apart, while those of the adaptive octave convolution were only 0.05%. This showed that the adaptive octave convolution not only had the highest classification accuracy, but was also less sensitive to the artificial parameter setting, effectively overcoming the influence of the artificial parameter setting on the classification result. To validated the Reluformer module, it was compared with transformer, LeakRelufromer, and Linformer in terms of accuracy evaluation metrics such as OA and AA. The Reluformer achieved the highest classification accuracy and the lowest model parameter count among these models. This indicated that Reluformer not only effectively extracted global features but also reduced computational complexity. Finally, the effectiveness of the high-frequency and low-frequency branch networks was verified. The effectiveness of the high-frequency and low-frequency feature extraction branches was verified, and the characteristics of the feature distribution after high-frequency feature extraction, after high-low frequency feature extraction, and after the classifier were displayed using a 2D t-sne, compared with the original feature distribution. It was found that after high-frequency feature extraction, similar features were generally clustered together, but the spacing between different features was small, and there were also some features with overlapping situations. After low-frequency feature extraction, it was obvious that similar features were clustered more tightly. After high-low frequency feature fusion, and after the classifier, it was obvious that similar features were clustered, and different types of features were clearly separated, indicating that high-low frequency feature extraction enhanced the classification effect. [Conclusion] This network achieves a good balance between crop variety classification accuracy and model complexity, and is expected to be deployed on hardware systems with limited resources in the future to achieve real-time classification functions.

Dense Nursery Stock Detecting and Counting Based on UAV Aerial Images and Improved LSC-CNN | Open Access
PENG Xiaodan, CHEN Fengjun, ZHU Xueyan, CAI Jiawei, GU Mengmeng
2024, 6(5):  88-97.  doi:10.12133/j.smartag.SA202404011
Asbtract ( 131 )   HTML ( 21)   PDF (2507KB) ( 136 )  
Figures and Tables | References | Related Articles | Metrics

[Objective] The number, location, and crown spread of nursery stock are important foundations data for their scientific management. Traditional approach of conducting nursery stock inventories through on-site individual plant surveys is labor-intensive and time-consuming. Low-cost and convenient unmanned aerial vehicles (UAVs) for on-site collection of nursery stock data are beginning to be utilized, and the statistical analysis of nursery stock information through technical means such as image processing achieved. During the data collection process, as the flight altitude of the UAV increases, the number of trees in a single image also increases. Although the anchor box can cover more information about the trees, the cost of annotation is enormous in the case of a large number of densely populated tree images. To tackle the challenges of tree adhesion and scale variance in images captured by UAVs over nursery stock, and to reduce the annotation costs, using point-labeled data as supervisory signals, an improved dense detection and counting model was proposed to accurately obtain the location, size, and quantity of the targets. [Method] To enhance the diversity of nursery stock samples, the spruce dataset, the Yosemite, and the KCL-London publicly available tree datasets were selected to construct a dense nursery stock dataset. A total of 1 520 nursery stock images were acquired and divided into training and testing sets at a ratio of 7:3. To enhance the model's adaptability to tree data of different scales and variations in lighting, data augmentation methods such as adjusting the contrast and resizing the images were applied to the images in the training set. After enhancement, the training set consists of 3 192 images, and the testing set contains 456 images. Considering the large number of trees contained in each image, to reduce the cost of annotation, the method of selecting the center point of the trees was used for labeling. The LSC-CNN model was selected as the base model. This model can detect the quantity, location, and size of trees through point-supervised training, thereby obtaining more information about the trees. The LSC-CNN model was made improved to address issues of missed detections and false positives that occurred during the testing process. Firstly, to address the issue of missed detections caused by severe adhesion of densely packed trees, the last convolutional layer of the feature extraction network was replaced with dilated convolution. This change enlarges the receptive field of the convolutional kernel on the input while preserving the detailed features of the trees. So the model is better able to capture a broader range of contextual information, thereby enhancing the model's understanding of the overall scene. Secondly, the convolutional block attention module (CBAM) attention mechanism was introduced at the beginning of each scale branch. This allowed the model to focus on the key features of trees at different scales and spatial locations, thereby improving the model's sensitivity to multi-scale information. Finally, the model was trained using label smooth cross-entropy loss function and grid winner-takes-all strategy, emphasizing regions with highest losses to boost tree feature recognition. [Results and Discussions] The mean counting accuracy (MCA), mean absolute error (MAE), and root mean square error (RMSE) were adopted as evaluation metrics. Ablation studies and comparative experiments were designed to demonstrate the performance of the improved LSC-CNN model. The ablation experiment proved that the improved LSC-CNN model could effectively resolve the issues of missed detections and false positives in the LSC-CNN model, which were caused by the density and large-scale variations present in the nursery stock dataset. IntegrateNet, PSGCNet, CANet, CSRNet, CLTR and LSC-CNN models were chosen as comparative models. The improved LSC-CNN model achieved MCA, MAE, and RMSE of 91.23%, 14.24, and 22.22, respectively, got an increase in MCA by 6.67%, 2.33%, 6.81%, 5.31%, 2.09% and 2.34%, respectively; a reduction in MAE by 21.19, 11.54, 18.92, 13.28, 11.30 and 10.26, respectively; and a decrease in RMSE by 28.22, 28.63, 26.63, 14.18, 24.38 and 12.15, respectively, compared to the IntegrateNet, PSGCNet, CANet, CSRNet, CLTR and LSC-CNN models. These results indicate that the improved LSC-CNN model achieves high counting accuracy and exhibits strong generalization ability. [Conclusions] The improved LSC-CNN model integrated the advantages of point supervision learning from density estimation methods and the generation of target bounding boxes from detection methods.These improvements demonstrate the enhanced performance of the improved LSC-CNN model in terms of accuracy, precision, and reliability in detecting and counting trees. This study could hold practical reference value for the statistical work of other types of nursery stock.

Detection Method of Effective Tillering of Rice in Field Based on Lightweight Ghost-YOLOv8 and Smart Phone | Open Access
CUI Jiale, ZENG Xiangfeng, REN Zhengwei, SUN Jian, TANG Chen, YANG Wanneng, SONG Peng
2024, 6(5):  98-107.  doi:10.12133/j.smartag.SA202407012
Asbtract ( 206 )   HTML ( 38)   PDF (2128KB) ( 107 )  
Figures and Tables | References | Related Articles | Metrics

[Objective] The number of effective tillers per plant is one of the important agronomic traits affecting rice yield. In order to solve the problems of high cost and low accuracy of effective tiller detection caused by dense tillers, mutual occlusion and ineffective tillers in rice, a method for dividing effective tillers and ineffective tillers in rice was proposed. Combined with the deep learning model, a high-throughput and low-cost mobile phone App for effective tiller detection in rice was developed to solve the practical problems of effective tiller investigation in rice under field conditions. [Methods] The investigations of rice tillering showed that the number of effective tillers of rice was often higher than that of ineffective tillers. Based on the difference in growth height between effective and ineffective tillers of rice, a new method for distinguishing effective tillers from ineffective tillers was proposed. A fixed height position of rice plants was selected to divide effective tillers from ineffective tillers, and rice was harvested at this position. After harvesting, cross-sectional images of rice tillering stems were taken using a mobile phone, and the stems were detected and counted by the YOLOv8 model. Only the cross-section of the stem was identified during detection, while the cross-section of the panicle was not identified. The number of effective tillers of rice was determined by the number of detected stems. In order to meet the needs of field work, a mobile phone App for effective tiller detection of rice was developed for real-time detection. GhostNet was used to lighten the YOLOv8 model. Ghost Bottle-Neck was integrated into C2f to replace the original BottleNeck to form C2f-Ghost module, and then the ordinary convolution in the network was replaced by Ghost convolution to reduce the complexity of the model. Based on the lightweight Ghost-YOLOv8 model, a mobile App for effective tiller detection of rice was designed and constructed using the Android Studio development platform and intranet penetration counting. [Results and Discussions] The results of field experiments showed that there were differences in the growth height of effective tillers and ineffective tillers of rice. The range of 52 % to 55 % of the total plant height of rice plants was selected for harvesting, and the number of stems was counted as the number of effective tillers per plant. The range was used as the division standard of effective tillers and ineffective tillers of rice. The accuracy and recall rate of effective tillers counting exceeded 99%, indicating that the standard was accurate and comprehensive in guiding effective tillers counting. Using the GhostNet lightweight YOLOv8 model, the parameter quantity of the lightweight Ghost-YOLOv8 model was reduced by 43%, the FPS was increased by 3.9, the accuracy rate was 0.988, the recall rate was 0.980, and the mAP was 0.994. The model still maintains excellent performance while light weighting. Based on the lightweight Ghost-YOLOv8 model, a mobile phone App for detecting effective tillers of rice was developed. The App was tested on 100 cross-sectional images of rice stems collected under the classification criteria established in this study. Compared with the results of manual counting of effective tillers per plant, the accuracy of the App's prediction results was 99.61%, the recall rate was 98.76%, and the coefficient of determination was 0.985 9, indicating the reliability of the App and the established standards in detecting effective tillers of rice. [Conclusions] Through the lightweight Ghost-YOLOv8 model, the number of stems in the cross-sectional images of stems collected under the standard was detected to obtain the effective tiller number of rice. An Android-side rice effective tillering detection App was developed, which can meet the field investigation of rice effective tillering, help breeders to collect data efficiently, and provide a basis for field prediction of rice yield. Further research could supplement the cross-sectional image dataset of multiple rice stems to enable simultaneous measurement of effective tillers across multiple rice plants and improve work efficiency. Further optimization and enhancement of the App's functionality is necessary to provide more tiller-related traits, such as tiller angle.

Lightweight Daylily Grading and Detection Model Based on Improved YOLOv10 | Open Access
JIN Xuemeng, LIANG Xiyin, DENG Pengfei
2024, 6(5):  108-118.  doi:10.12133/j.smartag.SA202407022
Asbtract ( 146 )   HTML ( 21)   PDF (1532KB) ( 125 )  
Figures and Tables | References | Related Articles | Metrics

[Objective] In the agricultural production, accurately classifying dried daylily grades is a critical task with significant economic implications. However, current target detection models face challenges such as inadequate accuracy and excessive parameters when applied to dried daylily grading, limiting their practical application and widespread use in real-world settings. To address these issues, an innovative lightweight YOLOv10-AD network model was proposed. The model aims to enhance detection accuracy by optimizing the network structure and loss functions while reducing parameters and computational costs, making it more suitable for deployment in resource-constrained agricultural production environments. [Methods] The dried daylilies selected from the Qingyang region of Gansu province as the research subject. A large number of images of dried daylilies, categorized into three grades superior, medium, and inferior, were collected using mobile phones under varying lighting conditions and backgrounds. The images were carefully annotated and augmented to build a comprehensive dataset for dried daylily grade classification. YOLOv10 was chosen as the base network, and a newly designed backbone network called AKVanillaNet was introduced. AKVanillaNet combines AKConv (adaptive kernel convolution) with VanillaNet's deep learning and shallow inference mechanisms. The second convolutional layer in VanillaNet was replaced with AKConv, and AKConv was merged with standard convolution layers at the end of the training phase to optimize the model for capturing the unique shape characteristics of dried daylilies. This innovative design not only improved detection accuracy but also significantly reduced the number of parameters and computational costs. Additionally, the DysnakeConv module was integrated into the C2f structure, replacing the Bottleneck layer with a Bottleneck-DS layer to form the new C2f-DysnakeConv module. This module enhanced the model's sensitivity to the shapes and boundaries of targets, allowing the neural network to better capture the shape information of irregular objects like dried daylilies, further improving the model's feature extraction capability. The Powerful-IOU (PIOU) loss function was also employed, which introduced a target-size-adaptive penalty factor and a gradient adjustment function. This design guided the anchor box regression along a more direct path, helping the model better fit the data and improve overall performance. [Results and Discussions] The testing results on the dried daylily grade classification dataset demonstrated that the YOLOv10-AD model achieved a mean average precision (mAP) of 85.7%. The model's parameters, computational volume, and size were 2.45 M, 6.2 GFLOPs, and 5.0 M, respectively, with a frame rate of 156 FPS. Compared to the benchmark model, YOLOv10-AD improved mAP by 5.7% and FPS by 25.8%, while reducing the number of parameters, computational volume, and model size by 9.3%, 24.4%, and 9.1%, respectively. These results indicated that YOLOv10-AD not only improved detection accuracy but also reduced the model's complexity, making it easier to deploy in real-world production environments. Furthermore, YOLOv10-AD outperformed larger models in the same series, such as YOLOv10s and YOLOv10m. Specifically, the weight, parameters, and computational volume of YOLOv10-AD were only 31.6%, 30.5%, and 25.3% of those in YOLOv10s, and 15.7%, 14.8%, and 9.8% of YOLOv10m. Despite using fewer resources, YOLOv10-AD achieved a mAP increase of 2.4% over YOLOv10s and 1.9% over YOLOv10m. These findings confirm that YOLOv10-AD maintains high detection accuracy while requiring significantly fewer resources, making it more suitable for agricultural production environments where computational capacity may be limited. The study also examined the performance of YOLOv10-AD under different lighting conditions. The results showed that YOLOv10-AD achieved an average accuracy of 92.3% in brighter environments and 78.6% in darker environments. In comparison, the YOLOv10n model achieved 88.9% and 71.0% in the same conditions, representing improvements of 3.4% and 7.6%, respectively. These findings demonstrate that YOLOv10-AD has a distinct advantage in maintaining high accuracy and confidence in grading dried daylilies across varying lighting conditions. [Conclusions] The YOLOv10-AD network model proposed significantly reduces the number of parameters and computational costs without compromising detection accuracy. This model presents a valuable technical reference for intelligent classification of dried daylily grades in agricultural production environments, particularly where resources are constrained.

Lightweight Tea Shoot Picking Point Recognition Model Based on Improved DeepLabV3+ | Open Access
HU Chengxi, TAN Lixin, WANG Wenyin, SONG Min
2024, 6(5):  119-127.  doi:10.12133/j.smartag.SA202403016
Asbtract ( 157 )   HTML ( 19)   PDF (1379KB) ( 97 )  
Figures and Tables | References | Related Articles | Metrics

[Objective] The picking of famous and high-quality tea is a crucial link in the tea industry. Identifying and locating the tender buds of famous and high-quality tea for picking is an important component of the modern tea picking robot. Traditional neural network methods suffer from issues such as large model size, long training times, and difficulties in dealing with complex scenes. In this study, based on the actual scenario of the Xiqing Tea Garden in Hunan Province, proposes a novel deep learning algorithm was proposed to solve the precise segmentation challenge of famous and high-quality tea picking points. [Methods] The primary technical innovation resided in the amalgamation of a lightweight network architecture, MobilenetV2, with an attention mechanism known as efficient channel attention network (ECANet), alongside optimization modules including atrous spatial pyramid pooling (ASPP). Initially, MobilenetV2 was employed as the feature extractor, substituting traditional convolution operations with depth wise separable convolutions. This led to a notable reduction in the model's parameter count and expedited the model training process. Subsequently, the innovative fusion of ECANet and ASPP modules constituted the ECA_ASPP module, with the intention of bolstering the model's capacity for fusing multi-scale features, especially pertinent to the intricate recognition of tea shoots. This fusion strategy facilitated the model's capability to capture more nuanced features of delicate shoots, thereby augmenting segmentation accuracy. The specific implementation steps entailed the feeding of image inputs through the improved network, whereupon MobilenetV2 was utilized to extract both shallow and deep features. Deep features were then fused via the ECA_ASPP module for the purpose of multi-scale feature integration, reinforcing the model's resilience to intricate backgrounds and variations in tea shoot morphology. Conversely, shallow features proceeded directly to the decoding stage, undergoing channel reduction processing before being integrated with upsampled deep features. This divide-and-conquer strategy effectively harnessed the benefits of features at differing levels of abstraction and, furthermore, heightened the model's recognition performance through meticulous feature fusion. Ultimately, through a sequence of convolutional operations and upsampling procedures, a prediction map congruent in resolution with the original image was generated, enabling the precise demarcation of tea shoot harvesting points. [Results and Discussions] The experimental outcomes indicated that the enhanced DeepLabV3+ model had achieved an average Intersection over Union (IoU) of 93.71% and an average pixel accuracy of 97.25% on the dataset of tea shoots. Compared to the original model based on Xception, there was a substantial decrease in the parameter count from 54.714 million to a mere 5.818 million, effectively accomplishing a significant lightweight redesign of the model. Further comparisons with other prevalent semantic segmentation networks revealed that the improved model exhibited remarkable advantages concerning pivotal metrics such as the number of parameters, training duration, and average IoU, highlighting its efficacy and precision in the domain of tea shoot recognition. This considerable decreased in parameter numbers not only facilitated a more resource-economical deployment but also led to abbreviated training periods, rendering the model highly suitable for real-time implementations amidst tea garden ecosystems. The elevated mean IoU and pixel accuracy attested to the model's capacity for precise demarcation and identification of tea shoots, even amidst intricate and varied datasets, demonstrating resilience and adaptability in pragmatic contexts. [Conclusions] This study effectively implements an efficient and accurate tea shoot recognition method through targeted model improvements and optimizations, furnishing crucial technical support for the practical application of intelligent tea picking robots. The introduction of lightweight DeepLabV3+ not only substantially enhances recognition speed and segmentation accuracy, but also mitigates hardware requirements, thereby promoting the practical application of intelligent picking technology in the tea industry.

Lightweight Apple Leaf Disease Detection Algorithm Based on Improved YOLOv8 | Open Access
LUO Youlu, PAN Yonghao, XIA Shunxing, TAO Youzhi
2024, 6(5):  128-138.  doi:10.12133/j.smartag.SA202406012
Asbtract ( 165 )   HTML ( 31)   PDF (1702KB) ( 91 )  
Figures and Tables | References | Related Articles | Metrics

[Objective] As one of China's most important agricultural products, apples hold a significant position in cultivation area and yield. However, during the growth process, apples are prone to various diseases that not only affect the quality of the fruit but also significantly reduce the yield, impacting farmers' economic benefits and the stability of market supply. To reduce the incidence of apple diseases and increase fruit yield, developing efficient and fast apple leaf disease detection technology is of great significance. An improved YOLOv8 algorithm was proposed to identify the leaf diseases that occurred during the growth of apples. [Methods] YOLOv8n model was selected to detect various leaf diseases such as brown rot, rust, apple scab, and sooty blotch that apples might encounter during growth. SPD-Conv was introduced to replace the original convolutional layers to retain fine-grained information and reduce model parameters and computational costs, thereby improving the accuracy of disease detection. The multi-scale dilated attention (MSDA) attention mechanism was added at appropriate positions in the Neck layer to enhance the model's feature representation capability, which allowed the model to learn the receptive field dynamically and adaptively focus on the most representative regions and features in the image, thereby enhancing the ability to extract disease-related features. Finally, inspired by the RepVGG architecture, the original detection head was optimized to achieve a separation of detection and inference architecture, which not only accelerated the model's inference speed but also enhanced feature learning capability. Additionally, a dataset of apple leaf diseases containing the aforementioned diseases was constructed, and experiments were conducted. [Results and Discussions] Compared to the original model, the improved model showed significant improvements in various performance metrics. The mAP50 and mAP50:95 achieved 88.2% and 37.0% respectively, which were 2.7% and 1.3% higher than the original model. In terms of precision and recall, the improved model increased to 83.1% and 80.2%, respectively, representing an improvement of 0.9% and 1.1% over the original model. Additionally, the size of the improved model was only 7.8 MB, and the computational cost was reduced by 0.1 G FLOPs. The impact of the MSDA placement on model performance was analyzed by adding it at different positions in the Neck layer, and relevant experiments were designed to verify this. The experimental results showed that adding MSDA at the small target layer in the Neck layer achieved the best effect, not only improving model performance but also maintaining low computational cost and model size, providing important references for the optimization of the MSDA mechanism. To further verify the effectiveness of the improved model, various mainstream models such as YOLOv7-tiny, YOLOv9-c, RetinaNet, and Faster-RCNN were compared with the propoed model. The experimental results showed that the improved model outperformed these models by 1.4%, 1.3%, 7.8%, and 11.6% in mAP50, 2.8%, 0.2%, 3.4%, and 5.6% in mAP50:95. Moreover, the improved model showed significant advantages in terms of floating-point operations, model size, and parameter count, with a parameter count of only 3.7 MB, making it more suitable for deployment on hardware-constrained devices such as drones. In addition, to assess the model's generalization ability, a stratified sampling method was used, selecting 20% of the images from the dataset as the test set. The results showed that the improved model could maintain a high detection accuracy in complex and variable scenes, with mAP50 and mAP50:95 increasing by 1.7% and 1.2%, respectively, compared to the original model. Considering the differences in the number of samples for each disease in the dataset, a class balance experiment was also designed. Synthetic samples were generated using oversampling techniques to increase the number of minority-class samples. The experimental results showed that the class-balanced dataset significantly improved the model's detection performance, with overall accuracy increasing from 83.1% to 85.8%, recall from 80.2% to 83.6%, mAP50 from 88.2% to 88.9%, and mAP50:95 from 37.0% to 39.4%. The class-balanced dataset significantly enhanced the model's performance in detecting minority diseases, thereby improving the overall performance of the model. [Conclusions] The improved model demonstrated significant advantages in apple leaf disease detection. By introducing SPD-Conv and MSDA attention mechanisms, the model achieved noticeable improvements in both precision and recall while effectively reducing computational costs, leading to more efficient detection capabilities. The improved model could provide continuous health monitoring throughout the apple growth process and offer robust data support for farmers' scientific decision-making before fruit harvesting.

MSH-YOLOv8: Mushroom Small Object Detection Method with Scale Reconstruction and Fusion | Open Access
YE Dapeng, JING Jun, ZHANG Zhide, LI Huihuang, WU Haoyu, XIE Limin
2024, 6(5):  139-152.  doi:10.12133/j.smartag.SA202404002
Asbtract ( 215 )   HTML ( 33)   PDF (2660KB) ( 383 )  
Figures and Tables | References | Related Articles | Metrics

[Objective] Traditional object detection algorithms applied in the agricultural field, such as those used for crop growth monitoring and harvesting, often suffer from insufficient accuracy. This is particularly problematic for small crops like mushrooms, where recognition and detection are more challenging. The introduction of small object detection technology promises to address these issues, potentially enhancing the precision, efficiency, and economic benefits of agricultural production management. However, achieving high accuracy in small object detection has remained a significant challenge, especially when dealing with varying image sizes and target scales. Although the YOLO series models excel in speed and large object detection, they still have shortcomings in small object detection. To address the issue of maintaining high accuracy amid changes in image size and target scale, a novel detection model, Multi-Strategy Handling YOLOv8 (MSH-YOLOv8), was proposed. [Methods] The proposed MSH-YOLOv8 model builds upon YOLOv8 by incorporating several key enhancements aimed at improving sensitivity to small-scale targets and overall detection performance. Firstly, an additional detection head was added to increase the model's sensitivity to small objects. To address computational redundancy and improve feature extraction, the Swin Transformer detection structure was introduced into the input module of the head network, creating what was termed the "Swin Head (SH)". Moreover, the model integrated the C2f_Deformable convolutionv4 (C2f_DCNv4) structure, which included deformable convolutions, and the Swin Transformer encoder structure, termed "Swinstage", to reconstruct the YOLOv8 backbone network. This optimization enhanced feature propagation and extraction capabilities, increasing the network's ability to handle targets with significant scale variations. Additionally, the normalization-based attention module (NAM) was employed to improve performance without compromising detection speed or computational complexity. To further enhance training efficacy and convergence speed, the original loss function CIoU was replaced with wise-intersection over union (WIoU) Loss. Furthermore, experiments were conducted using mushrooms as the research subject on the open Fungi dataset. Approximately 200 images with resolution sizes around 600×800 were selected as the main research material, along with 50 images each with resolution sizes around 200×400 and 1 000×1 200 to ensure representativeness and generalization of image sizes. During the data augmentation phase, a generative adversarial network (GAN) was utilized for resolution reconstruction of low-resolution images, thereby preserving semantic quality as much as possible. In the post-processing phase, dynamic resolution training, multi-scale testing, soft non-maximum suppression (Soft-NMS), and weighted boxes fusion (WBF) were applied to enhance the model's small object detection capabilities under varying scales. [Results and Discussions] The improved MSH-YOLOv8 achieved an average precision at 50% (AP50) intersection over union of 98.49% and an AP@50-95 of 75.29%, with the small object detection metric APs reaching 39.73%. Compared to mainstream models like YOLOv8, these metrics showed improvements of 2.34%, 4.06% and 8.55%, respectively. When compared to the advanced TPH-YOLOv5 model, the improvements were 2.14%, 2.76% and 6.89%, respectively. The ensemble model, MSH-YOLOv8-ensemble, showed even more significant improvements, with AP50 and APs reaching 99.14% and 40.59%, respectively, an increase of 4.06% and 8.55% over YOLOv8. These results indicate the robustness and enhanced performance of the MSH-YOLOv8 model, particularly in detecting small objects under varying conditions. Further application of this methodology on the Alibaba Cloud Tianchi databases "Tomato Detection" and "Apple Detection" yielded MSH-YOLOv8-t and MSH-YOLOv8-a models (collectively referred to as MSH-YOLOv8). Visual comparison of detection results demonstrated that MSH-YOLOv8 significantly improved the recognition of dense and blurry small-scale tomatoes and apples. This indicated that the MSH-YOLOv8 method possesses strong cross-dataset generalization capability and effectively recognizes small-scale targets. In addition to quantitative improvements, qualitative assessments showed that the MSH-YOLOv8 model could handle complex scenarios involving occlusions, varying lighting conditions, and different growth stages of the crops. This demonstrates the practical applicability of the model in real-world agricultural settings, where such challenges are common. [Conclusions] The MSH-YOLOv8 improvement method proposed in this study effectively enhances the detection accuracy of small mushroom targets under varying image sizes and target scales. This approach leverages multiple strategies to optimize both the architecture and the training process, resulting in a robust model capable of high-precision small object detection. The methodology's application to other datasets, such as those for tomato and apple detection, further underscores its generalizability and potential for broader use in agricultural monitoring and management tasks.

Cow Hoof Slippage Detecting Method Based on Enhanced DeepLabCut Model | Open Access
NIAN Yue, ZHAO Kaixuan, JI Jiangtao
2024, 6(5):  153-163.  doi:10.12133/j.smartag.SA202406014
Asbtract ( 74 )   HTML ( 6)   PDF (1765KB) ( 64 )  
Figures and Tables | References | Related Articles | Metrics

[Objective] The phenomenon of hoof slipping occurs during the walking process of cows, which indicates the deterioration of the farming environment and a decline in the cows' locomotor function. Slippery grounds can lead to injuries in cows, resulting in unnecessary economic losses for farmers. To achieve automatically recognizing and detecting slippery hoof postures during walking, the study focuses on the localization and analysis of key body points of cows based on deep learning methods. Motion curves of the key body points were analyzed, and features were extracted. The effectiveness of the extracted features was verified using a decision tree classification algorithm, with the aim of achieving automatic detection of slippery hoof postures in cows. [Method] An improved localization method for the key body points of cows, specifically the head and four hooves, was proposed based on the DeepLabCut model. Ten networks, including ResNet series, MobileNet-V2 series, and EfficientNet series, were selected to respectively replace the backbone network structure of DeepLabCut for model training. The root mean square error(RMSE), model size, FPS, and other indicators were chosen, and after comprehensive consideration, the optimal backbone network structure was selected as the pre-improved network. A network structure that fused the convolutional block attention module (CBAM) attention mechanism with ResNet-50 was proposed. A lightweight attention module, CBAM, was introduced to improve the ResNet-50 network structure. To enhance the model's generalization ability and robustness, the CBAM attention mechanism was embedded into the first convolution layer and the last convolution layer of the ResNet-50 network structure. Videos of cows with slippery hooves walking in profile were predicted for key body points using the improved DeepLabCut model, and the obtained key point coordinates were used to plot the motion curves of the cows' key body points. Based on the motion curves of the cows' key body points, the feature parameter Feature1 for detecting slippery hooves was extracted, which represented the local peak values of the derivative of the motion curves of the cows' four hooves. The feature parameter Feature2 for predicting slippery hoof distances was extracted, specifically the minimum local peak points of the derivative curve of the hooves, along with the local minimum points to the left and right of these peaks. The effectiveness of the extracted Feature1 feature parameters was verified using a decision tree classification model. Slippery hoof feature parameters Feature1 for each hoof were extracted, and the standard deviation of Feature1 was calculated for each hoof. Ultimately, a set of four standard deviations for each cow was extracted as input parameters for the classification model. The classification performance was evaluated using four common objective metrics, including accuracy, precision, recall, and F1-Score. The prediction accuracy for slippery hoof distances was assessed using RMSE as the evaluation metric. [Results and Discussion] After all ten models reached convergence, the loss values ranked from smallest to largest were found in the EfficientNet series, ResNet series, and MobileNet-V2 series, respectively. Among them, ResNet-50 exhibited the best localization accuracy in both the training set and validation set, with RMSE values of only 2.69 pixels and 3.31 pixels, respectively. The MobileNet series had the fastest inference speed, reaching 48 f/s, while the inference speeds of the ResNet series and MobileNet series were comparable, with ResNet series performing slightly better than MobileNet series. Considering the above factors, ResNet-50 was ultimately selected as the backbone network for further improvements on DeepLabCut. Compared to the original ResNet-50 network, the ResNet-50 network improved by integrating the CBAM module showed a significant enhancement in localization accuracy. The accuracy of the improved network increased by 3.7% in the training set and by 9.7% in the validation set. The RMSE between the predicted body key points and manually labeled points was only 2.99 pixels, with localization results for the right hind hoof, right front hoof, left hind hoof, left front hoof, and head improved by 12.1%, 44.9%, 0.04%, 48.2%, and 39.7%, respectively. To validate the advancement of the improved model, a comparison was made with the mainstream key point localization model, YOLOv8s-pose, which showed that the RMSE was reduced by 1.06 pixels compared to YOLOv8s-pose. This indicated that the ResNet-50 network integrated with the CBAM attention mechanism possessed superior localization accuracy. In the verification of the cow slippery hoof detection classification model, a 10-fold cross-validation was conducted to evaluate the performance of the cow slippery hoof classification model, resulting in average values of accuracy, precision, recall, and F1-Score at 90.42%, 0.943, 0.949, and 0.941, respectively. The error in the calculated slippery hoof distance of the cows, using the slippery hoof distance feature parameter Feature2, compared to the manually calibrated slippery hoof distance was found to be 1.363 pixels. [Conclusion] The ResNet-50 network model improved by integrating the CBAM module showed a high accuracy in the localization of key body points of cows. The cow slippery hoof judgment model and the cow slippery hoof distance prediction model, based on the extracted feature parameters for slippery hoof judgment and slippery hoof distance detection, both exhibited small errors when compared to manual detection results. This indicated that the proposed enhanced deeplabcut model obtained good accuracy and could provide technical support for the automatic detection of slippery hooves in cows.

Authority in Charge: Ministry of Agriculture and Rural Affairs of the People’s Republic of China
Sponsor: Agricultural Information Institute, Chinese Academy of Agricultural Sciences
Editor-in-Chief: Chunjiang Zhao, Academician of Chinese Academy of Engineering.
ISSN 2097-485X(Online)
ISSN 2096-8094(Print)
CN 10-1681/S
CODEN ZNZHD7

Search by Issue
Search by Key words
Archive By Volume
Smart Agriculture Wechat
Visited
Links