Smart Agriculture

Lightweight Detection Method for Pepper Leaf Diseases and Pests Based on Improved YOLOv12s |

YAO Xiaotong, QU Shaoye

2026, 8(1): 1-14. doi:10.12133/j.smartag.SA202506005

Asbtract ( 2516 )

HTML ( 200)

PDF (2085KB) ( 203 )

Figures and Tables | References | Related Articles | Metrics

[Objective] Pepper cultivation frequently faces challenges from diseases and pests, and early detection is critical for reducing yield losses. However, existing detection models often suffer from limitations such as insufficient feature extraction for subtle lesions, loss of edge information due to complex backgrounds, and high missed detection rates for small lesions. To address these issues, the YOLO-MDFR (You Only Look Once), a lightweight detection algorithm was proposed based on an enhanced YOLOv12s, specifically designed for accurate identification of pepper leaf diseases and pests in complex natural environments. [Methods] The dataset was established in the primary pepper cultivation zone of Gangu county, Tianshui city, Gansu province. The cultivated variety was the locally dominant Capsicum annuum L. var. conoides (Mill.). Data collection was conducted from March 15 to May 20, 2024. The collected samples included four categories of pepper leaves: healthy leaves, leaves damaged by thrips, leaves infected with tobacco mosaic virus exhibiting yellowing symptoms, and leaves affected by bacterial leaf spot. First, the original YOLOv12s backbone was replaced with an improved MobileNetV4 architecture to enhance lightweight performance while preserving feature extraction capability. Specifically, the original 5×5 standard convolutions in the bottleneck layers of MobileNetV4 were substituted with two sequential 3×3 depthwise separable convolutions. This design was based on the principle that two 3×3 convolutions achieve an equivalent receptive field (matching the 5×5 coverage) while reducing parameter count—depthwise separable convolutions further decompose spatial and channel convolution, minimizing redundant computations. Second, a novel dimensional frequency reciprocal attention mixing transformer (D-F-Ramit) module was introduced to enhance sensitivity to lesion boundaries and fine-grained textures. The module first converted feature maps from the spatial domain to the frequency domain using discrete cosine transform (DCT), capturing high-frequency components often lost in spatial-only attention. It then integrated three parallel branches: channel attention, spatial attention, and frequency-domain attention. Finally, a residual aggregation gate-controlled convolution (RAGConv) module was developed for the neck network. This module included a residual aggregation path to collect multi-layer feature information and a gate control unit that dynamically weighted feature components based on their relevance. The residual structure provided a direct gradient propagation path, alleviating gradient vanishing during backpropagation and ensuring efficient information transfer during feature fusion. A systematic experimental framework was established to comprehensively evaluate model performance: (1) Ablation studies were conducted using a controlled variable approach to verify the individual contributions of the improved MobileNetV4, D-F-Ramit, and RAGConv modules; (2) Lesion scale sensitivity analysis assessed detection performance across different lesion sizes, with emphasis on small-spot recognition; (3) Resolution impact analysis evaluated five common input resolutions (320×320–736×736) to explore the trade-offs among accuracy, speed, and computational efficiency; and (4) Embedded deployment validation involved model quantization and implementation on the Rockchip RK3588 platform to measure inference speed and power consumption on edge devices. [Results and Discussions] The proposed YOLO-MDFR achieved an mAP@0.5 of 95.6% on this dataset. Compared to YOLOv12s, it improved accuracy by 2.0%, reduced parameters by 61.5%, and lowered computational complexity by 68.5%. Real-time testing showed 43.4 f/s on an NVIDIA RTX 4060 GPU (CUDA 12.2) and 22.8 f/s on a Rockchip RK3588 embedded platform with only 3.5 W power consumption—suitable for battery-powered field devices. Lesion-scale analysis revealed 33.5% accuracy for <16×16 pixel lesions critical for early detection. Confusion matrix evaluation reduced misclassification, bacterial leaf spot/thrips damage misrates fell from 5.8% to 2.1%, and tobacco mosaic virus/healthy leaves from 3.2% to 1.5%, resulting in an overall 2.3% misrate. Experiments across varying input resolutions revealed a clear performance–resolution trade-off. As resolution increased from 320×320 to 736×736, mAP rose from 89.5% to 96.2%, showing diminishing returns beyond 512×512. Concurrently, computational cost grew roughly quadratically, reducing inference speed from 65.2 f/s to 35.1 f/s. [Conclusions] This study presents YOLO-MDFR, a lightweight detection model for identifying pepper leaf diseases and pests under complex natural conditions. By integrating an improved MobileNetV4 backbone, a multi-dimensional frequency reciprocal attention mixing transformer (D-F-Ramit), and a residual aggregation gate-controlled convolution (RAGConv) module, YOLO-MDFR outperforms mainstream detection models in both accuracy and efficiency. Systematic deployment experiments yielded optimized configurations for different application scenarios. Despite its strong performance, the model shows limitations in robustness under extreme lighting, generalization to emerging diseases, and detection of small targets under occlusion. Future work will address these issues through ambient light data fusion, domain adaptation with semi-supervised learning, and binocular vision integration.

Tea Leaf Disease Diagnosis Based on Improved Lightweight U-Net3+ |

HU Yumeng, GUAN Feifan, XIE Dongchen, MA Ping, YU Youben, ZHOU Jie, NIE Yanming, HUANG Lüwen

2026, 8(1): 15-27. doi:10.12133/j.smartag.SA202507010

Asbtract ( 1449 )

HTML ( 23)

PDF (1627KB) ( 66 )

Figures and Tables | References | Related Articles | Metrics

[Objective] Leaf diseases significantly affect both the yield and quality of tea throughout the year. To address the issue of inadequate segmentation finesse in the current tea spot segmentation models, a novel diagnosis of the severity of tea spots was proposed in this research, designated as MDC-U-Net3+, to enhance segmentation accuracy on the base framework of U-Net3+. [Methods] Multi-scale feature fusion module (MSFFM) was incorporated into the backbone network of U-Net3+ to obtain feature information across multiple receptive fields of diseased spots, thereby reducing the loss of features within the encoder. Dual multi-scale attention (DMSA) was incorporated into the skip connection process to mitigate the segmentation boundary ambiguity issue. This integration facilitates the comprehensive fusion of fine-grained and coarse-grained semantic information at full scale. Furthermore, the segmented mask image was subjected to conditional random fields (CRF) to enhance the optimization of the segmentation results [Results and Discussions] The improved model MDC-U-Net3+ achieved a mean pixel accuracy (mPA) of 94.92%, accompanied by a mean Intersection over Union (mIoU) ratio of 90.9%. When compared to the mPA and mIoU of U-Net3+, MDC-U-Net3+ model showed improvements of 1.85 and 2.12 percentage points, respectively. These results illustrated a more effective segmentation performance than that achieved by other classical semantic segmentation models. [Conclusions] The methodology presented herein could provide data support for automated disease detection and precise medication, consequently reducing the losses associated with tea diseases.

Rice Disease Identification Method Based on Improved MobileViT Model and System Development |

LIU Xiaojun, WU Qian, SUN Chuanliang, QI Chao, ZHANG Gufeng, LEI Tianjie, LIANG Wanjie

2026, 8(1): 28-39. doi:10.12133/j.smartag.SA202507043

Asbtract ( 1525 )

HTML ( 34)

PDF (3484KB) ( 88 )

Figures and Tables | References | Related Articles | Metrics

[Objective] Under abiotic stress conditions, rice plants become fragile and susceptible to disease infection. Accurate diagnosis and scientific prevention and control strategies for rice diseases are crucial for the prevention and control of rice diseases, even disasters such as blooding and high temperatures. However, under field natural environmental conditions, the identification of rice diseases is a challenging problem. There are various issues such as complex backgrounds, illumination changes, occlusion, which make it extremely difficult to comprehensively obtain disease information, thus significantly increasing the difficulty of disease identification. This study aims to develop an efficient rice disease recognition model by integrating the efficient channel attention (ECA) mechanism with the MobileViT model, enhancing the accuracy of rice disease identification in the field. Additionally, the rice disease knowledge graph was combined to achieve precise diagnosis and generate scientifically grounded control prescriptions for effective disease management. [Methods] A total of 1 304 raw images of rice diseases were collected from different rice disease investigation and long-term monitoring points in Jiangsu province, at different periods of time, using mobile phones and cameras. 167 disease images from the rice leaf disease image samples dataset were used to supplement the dataset. The raw images were accurately classified and preprocessed under the guidance of plant protection experts. A dataset containing 1 471 original images was constructed that includes seven types of rice diseases: bacterial leaf blight, false smut, leaf blast, bakanae disease, heart rot, grain discoloration, and panicle blast. The dataset was partitioned into training, validation, and test sets following a 7:1.5:1.5 ratio. Data augmentation techniques were applied exclusively to the training and validation sets to enhance sample diversity, while the test set remained unaugmented to preserve its independence for unbiased model evaluation. Post-augmentation, the total image count increased to 7 735. A novel rice disease recognition model was established by integrating the efficient channel attention (ECA) module into the MobileViT model. The recognition model architecture was optimized by improving convolutional structures, reconstructing Transformer encoding blocks, replacing activation function using SiLU. To verify the performance of the model, cross validation and ablation experiments were conducted. After establishing a highly accurate recognition model, the recognition model was combined with the rice disease knowledge graph to achieve accurate diagnosis of rice diseases and generate scientific prevention and control strategies. Finally, an intelligent rice disease diagnostic system was developed using the Flask framework and cloud computing technologies. [Results and Discussions] The results of the ablation study revealed that the model, which combined convolutional layer optimization, Transformer block reconstruction, and the integration of the ECA module, got outstanding performance.The overall precision, F₁-Score and recall rate achieved 97.27%, 97.32%, and 97.46%, respectively. In terms of accuracy, the improved model increased to 97.25%, representing an improvement of 2.3% over the original model (94.95%). To further verify the effectiveness of the improved model, various mainstream models such as Swin Transformer, TinyVit, and ConvNeXt were compared with the proposed model.The experimental results showed that the improved model outperformed the suboptimal model (TinyVit) by 0.92, 1.43, 0.95, 1.32 percent points in overall accuracy, F₁-Score and recall rate, respectively. Moreover, the improved model showed significant advantages in terms of floating-point operations, model size, and parameter count, with a parameter count of only 6.02 MB, making it more suitable for deployment on hardware-constrained devices. Analysis of the confusion matrix and heatmap visualizations revealed that the enhanced model achieved recognition accuracy improvements of 0.6, 0.3, 0.3, and 0.6 percentage points for bacterial leaf blight, heart rot, grain discoloration, and panicle blast, respectively. The integrated system, combining this model with the knowledge graph, demonstrated significantly enhanced accuracy in disease identification and diagnosis. Meanwhile, the disease prevention and control strategies were generated to guide rice disease prevention and control. During field deployment, the rice disease diagnosis system achieved an accuracy rate as high as 98%, with an average response time of 181 ms, demonstrating reliable real-time performance and stability. [Conclusions] By integrating ECA module and reconstructing Transformer encoding blocks, the MobileViT model achieved noticeable improvements in precision, recall and F₁ score, while effectively reducing computational costs, leading to more efficient recognition capabilities of rice diseases in complex field environments. The application of the rice disease intelligent diagnosis system revealed that the system could achieve accurate rice disease diagnosis results, and generate disease prevention and control strategies for guide rice disease prevention and control. This method could effectively improve the prevention and control efficiency of rice diseases, providing technical support for improving the quality, efficiency, digitization and intelligence of rice production.

Low-rank Adaptation Method for Fine-tuning Plant Disease Recognition Models |

HUANG Jinqing, YE Jin, HU Huilin, YANG Jihui, LAN Wei, ZHANG Yanqing

2026, 8(1): 40-51. doi:10.12133/j.smartag.SA202504003

Asbtract ( 1539 )

HTML ( 65)

PDF (2250KB) ( 99 )

Figures and Tables | References | Related Articles | Metrics

[Objective] When deep learning is applied to plant disease recognition tasks, model fine-tuning faces significant challenges, including limited computational resources and high parameter update overhead. Although traditional low-rank adaptation (LoRA) methods effectively reduce parameter overhead, their strategy of assigning a uniform, fixed rank to all layers often overlooks the varying importance of different layers. This approach may still lead to constrained optimization in critical layers or resource waste in less significant ones. To address this limitation, a dynamic rank allocation (DRA) algorithm is proposed in this research. The DRA algorithm is designed to evaluate and adjust the required parameter resources for each layer during training, enhance the accuracy of plant disease classification models while more efficiently balancing computational resources. [Methods] Public datasets of the Wheat Plant Diseases Dataset and the Plants Disease Dataset were utilized in the experiments. The Wheat Plant Diseases Dataset comprised 13 104 images covering 15 types of wheat diseases such as black rust and fusarium head blight, while the Plants Disease Dataset included 37 505 images of 26 types of plant diseases such as algal leaf spot, corn rust, and bacterial spot of tomato. These datasets were captured under varied lighting, different backgrounds, diverse angles, and at various stages of plant growth. A cross-layer feature similarity metric based on centred kernel alignment (CKA) was introduced to quantify the representational correlation between different layers. Concurrently, a correction factor was constructed based on gradient information and activation intensity to measure the direct impact of each layer on the loss function. These two metrics were then fused using a weighted harmonic mean to generate a comprehensive importance score, which was subsequently used for the initial rank allocation. Furthermore, considering the effect of feature representation changes during training, a stability-triggered adaptive rank update strategy rank re-allocation (RRA) was proposed. This strategy monitored the average parameter change of the low-rank adapters during the training process to determine the convergence state. When this change fell below a specific threshold, the low-rank matrices were merged into the original weights, and the rank allocation table was then re-calculated and updated. This process ensured that more resources were allocated to critical layers, thereby achieving an optimized allocation of parameter resources across different layers. [Results and Discussions] Tests on four models (AlexNet, MobileNetV2, RegNetY, and ConvNeXt) indicated that, compared to full-parameter fine-tuning, the proposed method reduced resource consumption to 0.42%, 2.46%, 3.56%, and 1.25%, respectively, while maintaining a comparable average accuracy. The RRA strategy demonstrated continuous parameter optimization throughout the model's training. On the ConvNeXt model, the trainable parameters on the plants disease dataset were progressively reduced from 18.34 to 9.26 M, a reduction of nearly 50%. In comparison with the standard LoRA method (R=16), the method reduced the accuracy by 0.38, 0.40 and 0.05 percentage points on the wheat plant diseases dataset for AlexNet, MobileNetV2, and RegNetY, respectively, while resource consumption was reduced by 59.3%, 87.4% and 50.5%. Robustness was tested by applying perturbations to the test set, including Gaussian noise, random cropping, color jitter, and random rotation. The results showed that the model was most affected by color jitter and random rotation on the Plants Disease Dataset, with accuracy decreasing by 6.02 and 5.11 percentage points, respectively. On the wheat plant diseases dataset, the model was more sensitive to random cropping and random rotation, with accuracy decreasing by 4.33 and 4.40 percentage points, respectively; the overall performance degradation remained within an acceptable range. When compared to other advanced low-rank methods such as AdaLoRA and DyLoRA under the same parameter budget, the DRA method exhibited higher accuracy. On the RegNetY model, the DRA method achieved an accuracy of 90.96% on the Plants Disease Dataset, which was 0.55 percentage points higher than AdaLoRA and 0.94 percentage points higher than DyLoRA. In terms of training efficiency on the Plants Disease Dataset, the DRA method required 43.5 minutes to reach its peak validation accuracy of 89.84%, whereas AdaLoRA required 52.3 minutes, representing a training time increase of approximately 20.23%. Regarding inference flexibility, the DyLoRA method was designed to generate a universal model capable of adapting to multiple rank configurations after a single training run, allowing for dynamic rank switching during inference based on hardware or latency requirements. The DRA method, however, did not possess this inference-time flexibility. It was focused on converging to a single, high-performance rank configuration for a specific task during the training phase. [Conclusions] The low-rank adaptive fine-tuning method proposed in this research significantly reduced the number of model training parameters while ensuring plant disease recognition accuracy. Compared to traditional fixed-rank LoRA and other advanced low-rank optimization methods, it demonstrated distinct advantages, providing an effective pathway for efficient model deployment on resource-constrained devices.

Intelligent Q&A Method for Crop Diseases and Pests Using LLM Augmented by Adaptive Hybrid Retrieval |

YANG Jun, YANG Wanxia, YANG Sen, HE Liang, ZHANG Di

2026, 8(1): 52-61. doi:10.12133/j.smartag.SA202506026

Asbtract ( 1838 )

HTML ( 50)

PDF (1647KB) ( 145 )

Figures and Tables | References | Related Articles | Metrics

[Objective] Extracting valuable knowledge from vast amounts of dispersed, heterogeneous, and unstructured agricultural big data, correlating and structuring it, and enhancing large models to form intelligent question-answering systems enables the effective delivery of services to all in agriculture. This approach can rapidly advance the scientific and precision-based development of agricultural production. Existing agricultural Q&A systems lack enough semantic understanding of complex symptoms, while general-purpose large language models (LLMs) produce factual hallucinations due to incomplete training data coverage. The research aims to address the issues of insufficient scale and low quality in the construction of knowledge bases in the agricultural field. [Methods] First, disease and pest data were collected along for five typical crops: wheat, rice, corn, potatoes, and cotton. Using manual verification methods, outliers were precisely identified and removed, ultimately yielding 87 901 unstructured data entries. Then, a few-shot learning model was employed to extract entities defined in the pattern layer, and these entities were aligned with the semantic vectors of Bert and LLM prompt engineering, ultimately yielding a triplet knowledge base of 916 239 entries for knowledge retrieval. A knowledge retrieval-augmented LLM approach for intelligent Q&A on crop diseases and pests was proposed, specifically the adaptive hybrid retrieval-augmented generation (AHR-RAG) approach. Firstly, an overlapping mechanism was introduced during fixed-length segmentation to mitigate semantic fragmentation. Simultaneously, vector semantic similarity was used to match highly related text blocks with the topic for optimization and storage. Then, single-hop and multi-hop retrieval were designed based on the complexity of the problem. Single-hop retrieval used the BM25 algorithm to match information extracted from the query with document content in the Elasticsearch index, feeding the results into the LLM to enhance answer generation. Multi-hop retrieval first converted user queries into structured conditions and semantic vector representations. Results retrieved from different knowledge bases were then fused using reciprocal rank fusion (RRF) and fed into the LLM. [Results and Discussions] The proposed method was experimentally compared with multiple baseline approaches, including different query types and complexity queries. The results demonstrated that the proposed method achieved accuracy and F₁ improvements of 0.193 and 0.170, respectively, on the Qwen1.5-7B-Chat model. Compared to the improved methods Self-RAG and Adaptive-RAG, AHR-RAG maintained low response times while achieving F₁improvements of 0.05 and 0.021, respectively, with an accuracy as high as 0.896. For multi-type question-answering tasks, compared to the Naive-RAG method that relied solely on prior knowledge, the AHR-RAG approach achieved accuracy improvements of 0.231, 0.123, and 0.157 for comparison, judgment and selection query types, respectively. For parsing complex semantics, AHR-RAG also demonstrated significant advantages. In single-hop queries, its accuracy reached 0.921, representing a 0.029 improvement over Adaptive-RAG. In multi-hop query scenarios, its accuracy reached 0.748, achieving gains of 0.082 and 0.059 over Self-RAG and Adaptive-RAG respectively. In retrieval-augmented generation, AHR-RAG achieved a 0.013 increase in accuracy and a 0.009 improvement in F₁ by optimizing prompt strategies, compared to directly feeding retrieval results to the model for output. [Conclusions] This research demonstrates strong adaptability to diverse query types and excels at reasoning complex queries such as multi-hop searches. It delivers significant advantages in answer generation accuracy, relevance, and comprehensiveness, producing responses with enhanced logical coherence and richer content. Future work will explore the integration of multimodal knowledge bases.

Multi-Scale Tea Leaf Disease Detection Method Based on Improved YOLOv11n |

XIAO Ruihong, TAN Lixin, WANG Rifeng, SONG Min, HU Chengxi

2026, 8(1): 62-71. doi:10.12133/j.smartag.SA202509014

Asbtract ( 1708 )

HTML ( 53)

PDF (1532KB) ( 66 )

Figures and Tables | References | Related Articles | Metrics

[Objective] Preventing and containing leaf diseases is a critical component of tea production, and accurate identification and localization of symptoms are essential for modern, automated plantation management. Field inspection in tea gardens poses distinctive challenges for vision-based detection algorithms: targets appeared at widely varying scales and morphologies under complex backgrounds and unfixed acquisition distances, which easily misled detectors. Models trained on standardized datasets with uniform distance and background often underperform, leading to false alarms and missed detections. To support method development under realistic constraints, YOLO-SADMFA (You Only Look Once-Switchable Atrous Dynamic Multi-scale Frequency-aware Adaptive), a detector based on the YOLOv11n backbone was proposed. The architecture aims to preserve fine details during repeated re-sampling (down- and up-sampling), strengthen modeling of lesions at varying scales, and refine multi-scale feature fusion. [Methods] The proposed architecture incorporated additional convolutional, feature extraction, upsampling, and detection head stages to better handle multi-scale representations, and a DMF-Upsample (Dynamic Multi-scale Frequency-aware Upsample) module that performed upsampling through multi-scale feature analysis and dynamic frequency adjustment fusion was introduced. This module enabled efficient multi-scale feature integration while effectively mitigating information loss during up- and down-sampling. Concretely, the DMF-Upsample analyzed multi-frequency responses from adjacent pyramid levels and fused them with dynamically learned frequency-selective weights, which preserved high-frequency lesion boundaries and textures while retaining low-frequency contextual structure such as leaf contours and global shading. A lightweight gating mechanism estimates per-location and per-channel coefficients to regulate the contribution of different bands, and a residual bypass preserved identity information to further reduce aliasing and oversmoothing introduced by repeated resampling. Furthermore, the baseline C3k2 block was replaced with a switchable atrous convolution (SAConv) module, which enhanced multi-scale feature capture by combining outputs from different dilation rates and incorporates a weight locking mechanism to improve model stability and performance. In practice, the SAConv aggregated parallel atrous branched at multiple dilation factors through learned coefficients under weight locking, which expanded the effective receptive field without sacrificing spatial resolution and suppressed gridding artifacts, while incurring modest parameter overhead. Lastly, an adaptive spatial feature fusion (ASFF) mechanism was integrated into the detection head, forming an ASFF-Head that learned spatially varying fusion weights across different feature scales, effectively filters conflicting information, and strengthens the model's robustness and overall detection accuracy. Together, these components formed a deeper yet efficient multi-scale pathway suited to complex field scenes. [Results and Discussions] Compared with the original YOLOv11n model, YOLO-SADMFA improved precision, recall, and mAP by 4.4, 8.4, and 3.7 percentage points, respectively, indicating more reliable identification and localization across diverse field scenes. The detector was particularly effective for multi-scale targets where the lesion area occupied approximately 10%－65% of the image, reflecting the variability introduced by unfixed acquisition distance during tea garden patrols. Under low illumination and in complex backgrounds with occlusions and clutter, it maintained stable performance, reduced both missed detections and false alarms, and effectively distinguished disease categories with similar morphology and color. On edge computing devices, it sustained about 161 f/s, which met real-time requirements for mobile inspection robots and portable systems. These outcomes demonstrated strengthened robustness to background interference and improved sensitivity at extreme scales, which was consistent with practical demands where the acquisition distance was not fixed. From an ablation perspective, DMF-Upsample preserved high-frequency lesion boundaries while retaining low-frequency structural context after resampling, SAConv expanded receptive fields through multi-dilation aggregation under a weight-locking mechanism, and the ASFF-Head mitigated conflicts among feature pyramids. Their combination yielded cumulative gains in stability and accuracy. Qualitative analyses further supported the quantitative results: Boundary localization improved for small, speckled lesions, large blotches were captured with fewer spurious edges, and distractors such as veins, shadows, and soil textures were less frequently misclassified, confirming the benefits of dynamic multi-scale frequency-aware fusion and adaptive spatial weighting in real field conditions. [Conclusions] The proposed YOLO-SADMFA effectively addressed the multi-scale disease detection challenge in complex tea garden environments, where acquisition distance was not fixed, lesion morphology and color were diverse, and cluttered backgrounds easily caused misjudgments and omissions. It significantly improved detection accuracy and robustness relative to the original YOLOv11n model across a wide range of target scales, and it maintained stable performance under low illumination and complex backgrounds typical of field inspections. It provided reliable technical support for automated tea leaf disease inspection systems by enabling accurate localization and identification of lesions in real operating conditions and by sustaining real-time inference on edge devices suitable for patrol-style deployment.

Self-Supervised Adaptive Multimodal Feature Fusion Recognition of Crop Diseases and Pests |

YE Penglin, MIN Chao, GOU Liangjie, WANG Pengcheng, HUANG Xiaopeng, LI Xin, MENG Yuping

2026, 8(1): 72-84. doi:10.12133/j.smartag.SA202509032

Asbtract ( 1520 )

HTML ( 38)

PDF (4722KB) ( 79 )

Figures and Tables | References | Related Articles | Metrics

[Objective] Crop diseases and pests are significant factors restricting global agricultural production. Traditional intelligent recognition technologies predominantly rely on single-modal image data processed by convolutional neural networks (CNNs) or Transformers. However, in complex natural environments, these methods often suffer from insufficient information utilization and limited robustness due to the lack of semantic guidance. Although emerging multimodal approaches like CLIP have introduced textual information, they typically rely on shallow feature alignment in the embedding space without achieving deep semantic interaction or effective feature fusion. Furthermore, the asymmetry between the quantity of image samples and text labels during training poses a challenge for effective cross-modal learning. In this study, a self-supervised adaptive multimodal feature fusion recognition (SAFusion-CLIP) method is proposed, aiming to significantly enhance classification accuracy and model generalization in fine-grained diseases and pests recognition tasks. [Methods] A comprehensive recognition framework was constructed, integrating four key components to achieve deep fusion of visual and textual features. First, prompt engineering was conducted by utilizing large language models (LLMs) combined with authoritative agricultural guides to transform simple category labels into fine-grained pathological semantic descriptions. These descriptions encapsulated morphological details, color gradients, and texture features, with quality verified by BERTScore and ROUGE-L metrics. Second, a cross-modal balanced alignment module was designed to resolve the problem of sample asymmetry between image batches and fixed text labels. This module employed a dot-product attention mechanism to calculate the correlation between image and text projections, applying Softmax normalization to dynamically align image features with their corresponding textual representations. Third, an adaptive fusion mechanism was employed to achieve deep semantic interaction. A gating unit based on the Sigmoid function was designed to calculate a gate value, which dynamically allocated weights to image and text features, allowing the model to adaptively integrate complementary information from both modalities. Finally, a self-supervised feature reconstruction task was introduced to enhance the robustness of feature representation. A simple decoder was utilized to reconstruct the original image and text embeddings from the fused features, and the model was optimized using a composite objective function combining image-text contrastive loss, mean squared error reconstruction loss, and weighted cross-entropy classification loss. [Results and Discussions] Extensive experiments were conducted on the standard PlantVillage dataset, which includes 39 categories covering 14 crop species. The proposed SAFusion-CLIP model achieved a classification accuracy of 99.67%, with precision, recall, and F₁-Score all exceeding 99.00%. Comparative analysis demonstrated that the proposed method significantly outperformed mainstream single-modal and baseline multimodal models, ResNet50 (96.51%), Swin-Transformer (97.48%), and baseline CLIP (98.23%), respectively. Visualization analysis using Gradient-weighted Class Activation Mapping (Grad-CAM) indicated that, unlike single-modal models which were susceptible to background noise or non-specific physical damage, the SAFusion-CLIP model focused more precisely on core lesion areas, effectively suppressing background interference. Furthermore, ablation studies confirmed the effectiveness of the proposed modules, showing that the combination of the self-supervised architecture and the adaptive fusion mechanism resulted in a 2.46 percentage points accuracy improvement over the baseline, validating the necessity of deep feature interaction and reconstruction tasks. [Conclusions] By fusing textual semantics with visual features, the SAFusion-CLIP method effectively overcame the limitations of single-modal recognition. The adaptive fusion mechanism ensured deep interaction between modalities, while the self-supervised reconstruction task significantly enhanced the robustness of feature representation. The experimental results verified that this data-driven approach significantly improves accuracy and generalization capabilities in fine-grained crop disease classification tasks, providing a new and effective solution for precision agricultural prevention and control.

Research Progress and Future Prospects of Pig Intelligent Detection Technology |

XIAO Deqin, LÜ Yuding, HUANG Yigui, CUAN Kaixuan

2026, 8(1): 86-103. doi:10.12133/j.smartag.SA202507048

Asbtract ( 1528 )

HTML ( 50)

PDF (3570KB) ( 105 )

Figures and Tables | References | Related Articles | Metrics

[Significance] The pig industry is a key sector of animal husbandry. With the continuous expansion of farming scale, traditional manual inspection methods can no longer meet the demands of modern production in terms of efficiency, accuracy, and animal welfare. In recent years, intelligent pig monitoring technologies based on multi-source data, such as images, depth information, sensors, and sound, have developed rapidly, providing new solutions for health monitoring, behavior recognition, weight assessment, and physiological state management during the farming process. As a crucial foundation for upgrading the pig industry toward intelligent and precise farming, it is of significant value to systematically review the current research status, application progress, and future trends of the technological system. [Progress] This paper focuses on the main research areas in intelligent pig monitoring, systematically summarizes the commonly used data types and their applications in farming scenarios from the perspective of matching data sources with application objectives. First, research based on infrared images mainly focuses on non-contact acquisition of body temperature information, which is used for disease early warning and health monitoring, offering clear advantages in reducing stress responses and increasing monitoring frequency. Second, visible-light images are widely applied in behavior recognition and health analysis, supporting automated identification and quantification of behaviors such as feeding, resting, and aggression, thereby facilitating dynamic understanding of pig herd behavior patterns and changes. Third, depth images and three-dimensional information demonstrate unique value in body measurement extraction and weight estimation, promoting the development of non-contact, continuous weight monitoring. Fourth, wearable sensors enable continuous monitoring of pig's health, lameness risk, and daily behavioral rhythms by recording physiological data such as body temperature, acceleration, and feeding activity in real time. Finally, audio signals, an emerging data type in recent years, have shown potential in monitoring abnormal sounds such as coughing, providing a new approach for the early detection of respiratory diseases. On this basis, this paper further summarizes the research and application of intelligent detection equipment. Current equipment presents a development trend in two aspects: one focuses on single indicators such as body temperature and weight, characterized by precise collection and rapid feedback; the other integrates multiple functions including image acquisition, body temperature detection, behavior recording, and identity recognition through mobile platforms such as inspection robots, enabling full-scenario and all-weather intelligent detection and improving the automation and refinement level of pig farm management. With the growth of industrial demand, various types of equipment are gradually moving from laboratories to commercialization, providing important support for intelligent breeding. [Conclusions and Prospects] Despite the rapid development of intelligent pig detection technology, multiple challenges still exist. At the data level, interference from lighting, occlusion, and noise in different scenarios can affect the stability of detection results; at the hardware level, some equipment suffers from high costs and needs improvement in reliability; at the model level, differences across pig farms, breeds, and growth stages still lead to insufficient adaptability; at the application level, data continuity, system stability, and equipment maintenance costs in large-scale scenarios require further optimization. These factors collectively restrict the large-scale promotion of intelligent detection technology in the industry. Future research directions will exhibit the following common trends: First, achieving contactless operation and multi-scenario adaptability to minimize disturbance to pigs and enhance stability in complex environments. Second, advancing the integration of multimodal data fusion and deep learning to establish stronger correlations among multi-source data such as images, sensors, and audio. Third, developing individualized health and growth models to provide a scientific basis for precision feeding and management.

Research Progress and Prospects of Intelligent Control Technology for Facility Vegetable |

WANG Jian, ZHAO Haosen, MA Yue, XING Bin, ZHU Wenying

2026, 8(1): 104-119. doi:10.12133/j.smartag.SA202508003

Asbtract ( 1521 )

HTML ( 64)

PDF (1311KB) ( 108 )

Figures and Tables | References | Related Articles | Metrics

[Significance] With the advancement of technology and diversified consumer demands, traditional agriculture is gradually transforming towards information and intelligence. Conducting research on intelligent management and control technologies for facility vegetable production is of great significance for improving vegetable yield and quality, ensuring stable market supply, and promoting high-quality development of the vegetable industry. The purpose of this article is to systematically collate the research status, key technologies and application constraints in the field of intelligent management and control of facility vegetables. By analyzing the development trends of environmental regrlation, growth monitoring and precise management, it provides scientific basis, theoretical support and decision-making references for the intelligent upgrading, technological innovation and policy formulation of Chinese facility vegetable industry, so as to boost the high-quality and sustainable development of facility agriculture. [Progress] This paper systematically analyzes the innovative applications of information technologies such as Internet of Things, block chain, and artificial intelligence in critical domains of facility vegetable production information, including precise regulation of the production environment, intelligent cultivation management and smart storage information management. In terms of precise regulation of the production environment, a temperature and humidity model for the optimal growth environment of tomatoes has been established, primarily utilizing Internet of Things technology. This enables precise monitoring and intelligent control of environmental parameters such as temperature, humidity, light, and carbon dioxide concentration within the facility, creating an optimal environment for vegetable growth. In the field of intelligent cultivation management, the integration of intelligent integrated water and fertilizer equipment, agricultural robotic operation systems, and pest and disease control has optimized the whole-process information-based management, effectively improving cultivation efficiency and vegetable quality. The integrated water and fertilizer systems apply Internet on Things technology to coordinate irrigation and fertilization through digital methods. Agricultural robotic operation systems are based on artificial intelligence, encompassing technologies such as machine learning, deep learning, neural networks and image processing. The pest and disease control section highlights the information-based applications in physical control, biological control and chemical control. In terms of smart storage information management, the application of origin storage preservation technology, intelligent classification and sorting systems, as well as traceability information platforms has significantly enhanced the circulation quality and safety assurance level of vegetables. Specifically, the origin storage preservation technology focuses on the development status of pre-cooling preservation, controlled atmosphere preservation, biological preservation and coating preservation. Intelligent grading and sorting technologies are categorized into non-destructive testing for both the external and internal quality of vegetables. The traceability information platform, leveraging blockchain and large model technologies, enables more intelligent management of facility vegetable production. [Conclusions and Prospects] This paper explores the problems encountered in the development of intelligent management and control technology for protected vegetables, including insufficient accuracy and stability of sensors, lagging regulatory decision-making, lack of equipment coordination mechanisms, poor integration of pest and disease control, fragmentation of information in the whole process of storage, difficulty in quality traceability, and lagging risk warning. Corresponding countermeasures and suggestions are proposed as follows: optimization of hardware, multi-technology integration to support precise perception and intelligent regulation, enhancement of equipment coordination and optimization, integration of pest and disease control, and construction of a virtual-real interactive storage management system through the integration of digital twins and metaverse. Finally, the paper prospects the future development direction of facility vegetable in precise control of production environment, cultivation management information, and storage information control.

Advances and Prospects in Body-Size Measurement of Sheep: From 2D Vision to 3D Reconstruction and 2D-3D Fusion |

DAI Weijiao, LIANG Yudongchen, ZHOU Yong, YAO Chao, ZHANG Cheng, SONG Yongjian, LI Guoliang, TIAN Fang

2026, 8(1): 120-147. doi:10.12133/j.smartag.SA202507028

Asbtract ( 1635 )

HTML ( 60)

PDF (4740KB) ( 80 )

Figures and Tables | References | Related Articles | Metrics

[Significance] In alignment with the national germplasm security strategy, current research efforts are accelerating the adoption of precision breeding in sheep. Within the whole-genome selection, accurate phenotyping of body morphometrics is critical for assessing growth performance and breeding value. Traditional manual measurements are inefficient, prone to human error, and may cause stress to sheep, limiting their suitability for precision sheep management. By summarizing the applications of sheep body size measurement technologies and analyzing their development directions, this paper provides theoretical references and practical guidance for the research and application of non contact sheep body size measurement. [Progress] This review synthesizes progress across three principal methodological paradigms: two-dimensional (2D) image-based techniques, three-dimensional (3D) point cloud-based approaches, and integrated 2D-3D fusion systems. 2D methods, employing either handcrafted geometric features or deep learning-based keypoint detector algorithms, are cost-effective and operationally simple but sensitive to variation in imaging conditions and unable to capture critical circumference metrics. 3D point-cloud approaches enable precise reconstruction of full animal morphology, supporting comprehensive body-size acquisition with higher accuracy, yet face challenges including high hardware costs, complex data workflows, and sensitivity to posture variability. Hybrid 2D-3D fusion systems combine semantic richness from RGB imagery with geometric completeness from point clouds. Having been effectively validated in other livestock specise, e.g., cattle and pigs, these fusion systems have demonstrated excellent performance, providing important technical references and practical insights for sheep body size measurement. [Conclusions and Prospects] Firstly, future research should focus on constructing large-scale, high-quality datasets for sheep body size measurement that encompass diverse breeds, growth stages, and environmental conditions, thereby enhancing model robustness and generalization. Secondly, the development of lightweight artificial intelligence models is essential. Techniques such as model compression, quantization, and algorithmic optimization can substantially reduce computational complexity and storage requirements, facilitating deployment in resource-constrained environments. Thirdly, the 3D point cloud processing pipeline should be streamlined to improve the efficiency of data acquisition, filtering, registration, and segmentation, while promoting the integration of low-cost, high-resilience vision systems into practical farming scenarios. Fourthly, specific emphasis should be placed on improving the accuracy of curved-dimensional measurements, such as chest circumference, abdominal circumference, and shank circumference, through advances in pose standardization, refined 3D segmentation strategies, and multi-modal data fusion. Finally, the cross-fertilization of sheep body size measurement technologies with analogous methods for other livestock species offers a promising pathway for mutual learning and collaborative innovation, accelerating the industrialization of automated sheep morphometric systems and supporting the development of intelligent, data-driven pasture management practices.

Greenhouse Temperature and Humidity Prediction Method Based on Adaptive Kalman Filter and GWO-LSTM-Attention |

CAI Yuqin, LIU Daming, XU Qin, LI Boyang, LIU Bojie

2026, 8(1): 148-155. doi:10.12133/j.smartag.SA202506033

Asbtract ( 1658 )

HTML ( 20)

PDF (1482KB) ( 59 )

Figures and Tables | References | Related Articles | Metrics

[Objective] Acquiring valid data is a critical part for establishing accurate greenhouse prediction models. However, simple averaging and weighted averaging are commonly used to process multi-sensor data in current research, but these methods are often ineffective against sensor noise interference. Additionally, greenhouse temperature and humidity exhibit strong coupling characteristics that necessitate coordinated control strategies. Prevailing studies predominantly train separate models for temperature and humidity prediction, which risks generating physically inconsistent results (e.g., simultaneous high temperature and high humidity), using it as the basis for control may lack reliability. Furthermore, the multi-dimensional environmental factor data in greenhouses has the characteristics of large volume and high computational cost. In the training process of the traditional LSTM model, parameters are manually adjusted based on human experience. When dealing with high-dimensional data, the model's convergence is slow and it is prone to getting stuck in local optima. [Methods] To address multi-point data fusion challenges, the traditional Kalman filtering algorithm was improved by dynamically adjusting the process noise covariance (Q) and observation noise covariance (R), while adaptively assigning weights to multiple sensors based on the innovation. This adaptation was achieved by monitoring the innovation sequence—the difference between observed and predicted measurements. Furthermore, the algorithm utilized the innovation covariance to assign adaptive weights to multiple sensors. Sensors with consistently smaller innovations which indicated higher reliability, were assigned greater weights. This mechanism enabled the system to swiftly identify and mitigate the impact of abnormal sensor readings, thereby ensuring robust and accurate fusion of multi-sensor data and providing a reliable foundation for subsequent model training. To address the strong coupling between temperature and humidity and their collaborative control requirements, a multi-output LSTM-attention model was developed for joint temperature-humidity prediction within a unified architecture. This model employed an attention mechanism to adaptively weight critical environmental factors, thereby resolving physical constraint violations inherent in univariate forecasting approaches. The multi-dimensional nature of greenhouse environmental data often leads to high computational costs during model training. In traditional practices, the hyperparameters of LSTM models were often manually tuned based on experience, a process that was not only inefficient but also prone to suboptimal convergence and local optima traps, especially with high-dimensional data. To overcome this limitation, the grey wolf optimizer (GWO) was integrated to automatically perform hyperparameter optimization search and efficiently search for the optimal combination of key hyperparameters, such as the number of hidden units, learning rate, and dropout rate. [Results and Discussions] The adaptive Kalman filtering algorithm proposed achieved mean absolute deviations (MAD) of 1.59 ℃ and 8.64% for multi-point temperature and humidity fusion, respectively. Compared to the traditional Kalman filter algorithm, these represented reductions of 1.24% and 8.57%. The algorithm enabled swift identification of abnormal sensors and effectively mitigated their impact. When utilizing the fusion results of this algorithm as the model training dataset, the R² values for temperature and humidity predictions reached 98.2% and 99.3%, respectively. This constituted an increase of 4.7 and 4.3 percentage points compared to results obtained using the Kalman filter, demonstrating that the algorithm provided a highly reliable data foundation for model training. Furthermore, the GWO-LSTM-Attention model trained on this data yielded root mean square errors (RMSE) of 0.776 8 and 2.056 4 for temperature and humidity prediction, respectively. Compared to the LSTM and LSTM-Attention time-series prediction models, the temperature RMSE was reduced by 15.6% and 6.6%, while the humidity RMSE saw reductions of 29.2% and 5.7%. This reflects the role of the GWO algorithm in enhancing model generalization capability and convergence efficiency. [Conclusions] The proposed adaptive Kalman fusion algorithm effectively integrates multi-sensor data, demonstrating robustness in handling sensor noise, outliers, and non-stationary environmental fluctuations. For predicting multiple greenhouse environmental factors, the developed GWO-LSTM-Attention model provides reliable forecasts across diverse time horizons. This study can provide a highly accurate prediction tool for greenhouse environment control. The combined prediction results could directly support the coordinated control of ventilation and irrigation equipment in the future, thereby reducing energy consumption.

Point Cloud Data-driven Methods for Estimating Maize Leaf Biomass |

WU Zhangbin, HE Ning, WU Yandong, GUO Xinyu, WEN Weiliang

2026, 8(1): 156-166. doi:10.12133/j.smartag.SA202509015

Asbtract ( 1453 )

HTML ( 15)

PDF (2964KB) ( 35 )

Figures and Tables | References | Related Articles | Metrics

[Objective] Maize leaf dry biomass is a key trait that reflects plant morphology, growth vigor, and physiological processes including photosynthetic production. Its dynamic changes can effectively characterize the growth status of maize. Accurate estimation of maize leaf dry biomass is crucial for accurately predicting maize yield and informing production management decisions. Extensive research on crop dry biomass estimation indicates that 3D point cloud data characterizing crop morphological structure, along with features derived therefrom, exhibit an extremely high correlation with crop dry biomass. However, traditional dry biomass prediction studies focus primarily on the population canopy scale, and lack effective prediction methods for dry biomass at the plant and organ scales. Research on non-destructive measurement methods for maize leaf dry biomass, based on 3D point clouds and machine learning, the demand is conducted to address for rapid acquisition of organ-level dry biomass information in maize cultivation and management research. [Methods] Maize leaf point cloud data were acquired using three techniques: Multi-view stereo (MVS), LiDAR scanning, and 3D digitalization (DT). The leaf point clouds underwent preprocessing steps that included plant segmentation, denoising, mesh refinement, and uniform subsampling. Subsequently, morphological traits were extracted from the processed data, including leaf length, leaf area, bounding box dimensions, and the number of points contained within the leaf point clouds. Three machine learning methods: random forest (RF), gradient boosting regression tree (GBRT), and support vector regression (SVR), as well as two deep learning methods: convolutional neural network (CNN) and fully connected neural network (FCNN), were employed for predicting maize leaf dry weight. A point cloud-based maize leaf dry biomass prediction model was subsequently developed. This study utilized the mean squared error reduction method inherent to RF and the cumulative improvement method based on decision tree splits in GBRT to rank and visualize feature importance for optimal models. The resulting rankings were then visualized. Simultaneously, Pearson correlation analysis was used to analyze the correlations of the features from the fused dataset (integrating data from the three devices) as well as those from the DT data with maize leaf dry biomass. [Results and Discussions] The results demonstrated that, among the dry biomass prediction models developed in this study, the model based on Laser point cloud data and the FCNN method achieved the highest accuracy, with a mean absolute error (MAE) of 0.08 g, a mean absolute percentage error (MAPE) of 4.60%, a root mean square error (RMSE) of 0.10 g, and a coefficient of determination (R²) of 0.98. In the correlation analysis, the leaf area exhibited the strongest correlation with dry biomass (r = 0.92), followed by the number of points (r = 0.88), leaf width (r = 0.86), and leaf length (r = 0.77). In the feature importance ranking, the leaf area trait consistently ranked within the top two positions, whereas the number of points ranked among the top three in most cases. However, features such as the height of the leaf base above the ground, the horizontal distances from the leaf tip and apex to the stem, and the azimuth angle demonstrated low correlations with dry biomass and low feature importance. [Conclusions] Among all the maize leaf features investigated in this study, size-related traits (such as leaf area, point count, leaf length, and leaf width) had the greatest impact on the accuracy of dry biomass estimation. The utilization of high-resolution 3D point clouds of maize leaves, combined with machine learning methods, enabled a high-accuracy estimation of leaf dry weight and provided a novel approach for the non-destructive measurement of dry biomass in crop organs.

Object Detection Method of Maize Ears Within Canopy Based on CornYOLO |

GAO Guangfu, WANG Qilei, SONG Liwen, FENG Haikuan, SHI Lei, YANG Hao, LIU Yang, YUE Jibo

2026, 8(1): 167-177. doi:10.12133/j.smartag.SA202509005

Asbtract ( 1793 )

HTML ( 26)

PDF (2701KB) ( 71 )

Figures and Tables | References | Related Articles | Metrics

[Objective] As a major grain crop, maize plays a critical role in global food security. The ears of maize serves as a key phenotypic trait, providing essential information on the plant's physiological and agronomic status. Its morphological characteristics, size, and color effectively reflect the plant's growth status and potential yield. Therefore, accurately acquiring images of maize ears in the field across different growth stages is crucial for breeding research and yield prediction. Traditional field detection of maize ears relies heavily on manual labor, which is not only inefficient and labor-intensive but also struggles to meet the high-throughput demands of modern precision breeding programs. There is an urgent need for efficient, automated detection technologies that can operate reliably under real-world field conditions. To address the requirement for efficient acquisition of maize ears phenotypic traits in field breeding work, the objective of this research is to develop a robust object detection solution suitable for large-scale field environments. An improved CornYOLO model based on the YOLO11n (You Only Look Once) architecture was designed to enhance the detection accuracy and efficiency of maize ears in complex field environments. [Methods] Image data were acquired using an unmanned ground vehicle (UGV) equipped with a high-resolution panoramic camera, which traversed multiple experimental plots under varying lighting and growth conditions. A dataset containing 1 152 annotated samples was constructed, covering diverse ear morphologies and occlusion scenarios. Dynamic data augmentation techniques were applied during training to enhance the model's generalization capability. Three key enhancements were introduced to the YOLO11n detection framework. First, a cross stage partial network with dynamic pointwise spatial attention (C2PDA) module was designed to replace the cross stage partial with pointwise spatial attention (C2PSA) module in the YOLO11 backbone network. This module enhanced spatial discriminability and channel sensitivity in feature representation through the collaborative integration of a dynamic channel weighting mechanism and position-aware modeling. It significantly improves the model's performance in identifying maize ears under challenging field conditions such as occlusion of stems and leaves and multi-scale target distribution. Second, the spatial pyramid pooling-fast (SPPF) module in the original model was replaced with an feature refinement module (FRM ) to optimize multi-scale feature fusion. The FRM functions via directional feature decomposition and an adaptive attention mechanism. It captures fine-grained spatial structural information through horizontal and vertical bidirectional pooling and combines spatial-channel cooperative attention for dynamic feature calibration, thereby improving recognition accuracy across varying ear sizes and complex backgrounds. Finally, the unified intersection over union (UIoU) loss function was introduced to optimize bounding box regression accuracy. UIoU is an innovative loss function that emphasizes weight allocation among prediction boxes of different qualities. It adaptively adjusted the weight of each prediction box's loss term based on the IoU value or its monotonic function, assigning higher weights to lower-quality predictions to prioritize their optimization, while reducing weights for high-quality boxes to prevent over-optimization. [Results and Discussions] Experimental results demonstrate that CornYOLO achieved a mAP@50 of 89.3% on the validation set, with the F₁-Score increasing by 2.5 percentage points. Compared to widely used lightweight models including YOLOv3-tiny, YOLOv5n, YOLOv8n, YOLOv10n, real-time detection transformer (RT-DETR) and YOLO13n, CornYOLO showed significantly superior detection performance in complex field environments, with mAP@50 improvements of 2.2, 1.9, 1.8, 5.7, 12.6 and 2.4 percentage points, respectively. These results fully validate that CornYOLO can efficiently and accurately extract maize ear images under field conditions, providing a technical foundation for precise phenotypic evaluation and yield prediction. Furthermore, ablation studies were conducted: Introducing the C2PDA module improved the model's mAP@50 by 0.5 percentage points and the F₁-Score by 0.5 percentage points. However, after incorporating the FRM module, which successfully enhanced multi-scale detection performance and increased the F₁-Score by 1.5 percentage points, the integration of these two modules resulted in the generation of a small number of low-quality detection boxes. The original loss function was inefficient in optimizing such boxes, resulting in no improvement in mAP@50 after the modification. To address this issue, the UIoU loss function was introduced. By dynamically adjusting weight assignments based on prediction quality, it significantly improved the regression performance for low-quality detection boxes, thereby enhancing the localization accuracy and convergence stability of the model in dense target scenarios. The final CornYOLO model exhibited excellent overall performance: Compared to the original YOLO11n, the F₁-Score increased by 2.5 percentage points and mAP@50 improved by 1.1 percentage points. The experimental results fully demonstrate that CornYOLO effectively enhances the detection capability for maize ears in complex field environments compared to the baseline YOLO11n model. [Conclusions] The CornYOLO model proposed in this study incorporates three key components: C2PDA, FRM, and UIoU, which enhances model convergence and localization performance in dense and occluded scenes, enables the model to effectively and precisely identify maize ears under practical conditions, thereby providing reliable technical support for phenotypic analysis and yield prediction in maize breeding. Future work will focus on extending the model to other crop types and further optimizing inference efficiency for real-time deployment on mobile platforms.

Intelligent Inspection Path Planning Algorithm for Large-Scale Cattle Farms |

CHEN Ruotong, LIU Jifang, ZHANG Zhiyong, MA Nan, WEI Peigang, WANG Yi, YANG Yantao

2026, 8(1): 178-191. doi:10.12133/j.smartag.SA202504004

Asbtract ( 1796 )

HTML ( 21)

PDF (4660KB) ( 97 )

Figures and Tables | References | Related Articles | Metrics

[Objective] Timely detection and early warning of livestock health issues are critical for green and efficient management within large-scale cattle farms. Traditional manual inspections are time-consuming, labor-intensive, and prone to missed or erroneous detections. Robotic inspections offer significant advantages including all-weather operation, high precision, high efficiency, and low cost. However, existing path planning approaches predominantly focus on dynamic obstacle avoidance and fixed target point inspection path, often failing to address two key challenges in dynamic large-scale farm environments: global traversal of individual large livestock (e.g., beef cattle, dairy cows) and accessibility of local areas compromised by dynamic obstacles. This study aims to overcome the limitations of existing robotic inspection systems in large-scale cattle farms, specifically addressing the lack of comprehensive inspection capability for dynamic individuals, excessive path redundancy, and insufficient proactive obstacle avoidance capability. [Methods] A global-local optimization algorithm was proposed for large-scale cattle farm intelligent inspection path planning, which integrated the traveling salesman problem (TSP), A* and dynamic window approach (DWA), and solved the problems of global multi-objective individual traversal, path redundancy and local passability with proactive obstacle avoidance in dynamic cattle farm scenarios. For global traversal optimization, a global path planning algorithm was introduced which combined improved TSP and optimized A*. Specifically, the inspection status list tracking breeding sheds and individual cattle was maintained to enhance the TSP's Nearest Neighbor Algorithm, dynamically updating targets to avoid re-visits. A dynamic priority mechanism optimized multi-objective inspection, determining the optimal visitation sequence across barns and dynamic paths within barns. The data structure of the A* algorithm was optimized, a diagonal distance heuristic function was introduced to replace Manhattan distance, which more accurately reflected the movement cost in eight directions. The path obtained by the A* algorithm through greedy strategy was simplified, and Bresenham's line algorithm was used to check whether there were obstacles in the straight line field of view. If there were no obstacles, redundant inflection points were removed to construct an efficient moving path between sheds. For local passability optimization, an enhanced DWA-based local path was proposed for planning algorithm. The dynamic safety threshold of obstacle size was introduced to improve the DWA. When the inspection robot judged that the size of the obstacle in the local accessible area was too large and the robot was difficult to pass, it would actively avoid or detour in advance to ensure the safe avoidance of large obstacles in narrow passages. The improved DWA also increased the task progress potential field, drived the robot to move to the breeding shed to be visited with the attractive force field model, balanced the local obstacle avoidance and global inspection efficiency, and realized the real-time judgment of local area passability caused by dynamic obstacles and proactive obstacle avoidance in advance. [Results and Discussions] The optimized A* algorithm's data structures significantly improved search efficiency. The diagonal distance heuristic and greedy strategy substantially enhanced path smoothness. Compared to the traditional A*, the improved A* achieved average reductions of 90.06% in planning time, 85.13% in path turns, and 1.83% in path length. The global inspection algorithm combining improved TSP and optimized A* achieved 100% average coverage of individual cattle. Inspection path length and time were reduced by 17.99% and 20.85%, respectively, compared to the classic ant colony optimization (ACO) algorithm, demonstrating superior efficiency in dynamic multi-objective inspection scenarios. The improved DWA successfully enabled proactive judgment of local path passability based on obstacle size. By adjusting the robot's linear velocity, angular velocity, and attitude angle in real time, the algorithm achieved robust proactive obstacle avoidance. The inspection robot would reduce the linear velocity in advance when encountering obstacles, and realize proactive obstacle avoidance by adjusting the attitude angle. Simulation experiments confirmed that robots equipped with the improved DWA effectively navigated around unknown static and dynamic obstacles while maintaining global path-tracking capability. [Conclusions] The global inspection algorithm combining improved TSP and optimized A*, utilizing dynamic inspection status lists and path optimization techniques, achieved global inspection coverage of individual cattle and could significantly improve inspection quality and efficiency. The local inspection algorithm based on improved DWA, incorporating obstacle size dynamic safety threshold and task progress, achieved real-time judgment of local passability and proactive obstacle avoidance, ensuring safe robot navigation in complex environments. The global-local co-optimization framework demonstrated adaptability to the dynamic farm environment, enabling the timely completion of individual traversal tasks, and providing a robust solution for intelligent inspection in large-scale cattle operations. Future work involves integrating the proposed path planning algorithm with simultaneous localization and mapping (SLAM), cattle identification, distance detection systems on inspection robot platforms, and conducting extensive field tests within operational cattle farms. Exploring multi-robot collaborative inspection frameworks and incorporating the Vision-and-Language Navigation model to enhance environmental perception and anomaly-handling capabilities are promising directions for adapting to the complexities of even larger-scale farming scenarios.

Underwater Insitu Weight Estimation Method for Chinese Mitten Crab Based on Binocular Vision and Improved YOLOv11-pose |

LI Aoqiang, DAI Hangyu, GUO Ya

2026, 8(1): 192-202. doi:10.12133/j.smartag.SA202505019

Asbtract ( 1692 )

HTML ( 57)

PDF (4043KB) ( 93 )

Figures and Tables | References | Related Articles | Metrics

[Objective] With the accelerated development of large-scale and intelligent aquaculture, accurate estimation of the body weight of individual Chinese mitten crabs is critical for tasks such as precise feeding, disease prevention, and optimization of harvest decisions. Traditional methods of manually catching and weighing crabs are time-consuming, labor-intensive, and can cause stress or injury to the crabs, while also failing to provide real-time monitoring. To address the challenges posed by turbid water conditions in aquaculture, which lead to poor image quality and difficulty in feature extraction, a method is proposed for estimating Chinese mitten crab weight that combines binocular vision with deep learning–based keypoint detection. This approach achieves high-precision detection of anatomical keypoints on the crab, providing new technical support for precision aquaculture and intelligent management. [Methods] Based on a lightweight YOLOv11 framework, in its C3K2 module, MBConv depthwise-separable convolutions were incorporated to significantly reduce computational complexity and improve feature extraction efficiency. An EffectiveSE channel attention mechanism was introduced to adaptively emphasize important channel-wise features. To further enhance cross-scale information fusion, a spatial dynamic feature fusion module (SDFM) was added. The SDFM adaptively and weightedly fused local spatial attention with global channel attention, enabling detailed extraction of crab shell edges and anatomical keypoints. The improved YOLOv11-ES model could simultaneously output the crab's bounding box, the positions of four anatomical keypoints, and the crab's sex classification in a single forward pass. In the 3D reconstruction stage, calibrated stereo camera parameters were used, and a sparse keypoint matching strategy guided by the crab's sex and spatial geometric constraints was employed. High-confidence keypoint pairs were selected from the left and right views, and the true 3D coordinates of the crab's carapace length and width were computed by triangulation. Finally, the obtained carapace length, width, and sex label data were fed into a two-layer back-propagation (BP) neural network to perform a regression prediction of the individual crab's weight. [Results and Discussion] To validate the effectiveness and robustness of the proposed method, a dataset of Chinese mitten crab images with annotated keypoints was constructed under varying water turbidity and lighting conditions, and both ablation and comparative experiments were conducted. The YOLOv11-ES achieved a mean average precision at intersection over union (IOU) threshold of 0.5 (mAP@50) of 97.2% on the test set, which was 4.4 percentage points higher than the original YOLOv11 model. The keypoint detection component reached an mAP@50 of 96.7%, which was 3.6 percentage points higher than that of the original YOLOv11 model. In comparative experiments, YOLOv11-ES also demonstrated significant advantages over other models in the same series. Moreover, in a full-system evaluation using images of 30 individual crabs, the mean absolute percentage error (MAPE) for carapace width measurements was only 2.68%, and for carapace length it was 1.48%. The Pearson correlation coefficients between the measured and manually obtained true values for both carapace length and width exceeded 0.977, indicating high accuracy in the 3D reconstruction and minimal measurement error. Experiments analyzing the influence of image quality on measurement accuracy showed that when the underwater image quality measure (UIQM) reached at least 1.5, the combined MAPE of carapace length and width errors could be kept below 5%. When UIQM reached at least 2.2, the MAPE dropped to about 1.9%. These results confirmed the robustness of the method against variations in water turbidity and lighting conditions. For weight regression prediction, the BP network trained on carapace length, width, and sex features achieved a mean absolute error (MAE) of 2.39 g and a MAPE of 7.1% on an independent test set, demonstrating high-precision estimation of individual crab weight. [Conclusions] The proposed method, which combines an improved YOLOv11 object detection network, binocular sparse keypoint matching, and a two-layer BP regression network, enabled high-precision, low-error, real-time, non-contact estimation of Chinese mitten crab weight in complex turbid aquatic environments. This approach featured a lightweight model, high computational efficiency, excellent measurement accuracy, and strong adaptability to varying environmental conditions. It provided key technical parameters for intelligent Chinese mitten crab farming. In the future, this approach could be extended to other aquaculture species and complex farming scenarios. Combined with transfer learning and online adaptive calibration techniques, its generalization capability could be further improved and integrated with intelligent monitoring platforms to achieve large-scale, all-weather underwater crab weight estimation, contributing to the sustainable development of smart aquaculture.

Online Detection System for Freshness of Fruits and Vegetables Based on Temporal Multi-source Information Fusion |

HUANG Xianguo, ZHU Qibing, HUANG Min

2026, 8(1): 203-212. doi:10.12133/j.smartag.SA202505037

Asbtract ( 1050 )

HTML ( 51)

PDF (2248KB) ( 60 )

Figures and Tables | References | Related Articles | Metrics

[Objective] Real-time and accurate quality monitoring of fruits and vegetables during cold chain logistics is of great importance for ensuring supply chain quality and reducing economic losses. However, traditional detection methods generally suffer from several core deficiencies, such as being offline, relying on unimodal information, and being unable to capture dynamic evolution. To overcome these challenges, an online freshness detection system is proposed and implemented for fruits and vegetables based on temporal multi-source information fusion. The system was designed to achieve precise online detection of fruit and vegetable freshness, providing an effective technical solution for the refined management and early spoilage warning within the cold chain supply chain, thereby significantly reducing economic losses. [Methods] A complete system was constructed, consisting of a lower-computer data acquisition node, an IoT cloud platform, and an upper-computer Qt client. The lower-computer synchronously collected environmental temporal sensing data (temperature, humidity, CO₂, ethylene) and visual temporal images of indicator tags via a self-designed portable acquisition node. A novel co-attention-based convolutional recurrent network (Co-ACRN) deep learning model was proposed for deeply mining the complex correlations between the two heterogeneous time-series data streams. This model innovatively employed a "co-attention + self-attention" dual mechanism. Firstly, in the early fusion stage, a co-attention module intelligently aligned and deeply integrated visual and sensor feature sequences by constructing a cross-modal affinity matrix. Subsequently, the fused sequence was fed into a long short-term memory (LSTM) network to encode temporal cumulative effects. Finally, a self-attention module performed a global contextual review on the LSTM output to capture long-range temporal dependencies. In the specific implementation, visual features were extracted by a lightweight convolutional neural network (CNN) with two convolutional-pooling layers; the co-attention calculated weights by generating context-aware intermediate features; and the self-attention adopted the standard scaled dot-product attention mechanism. For application deployment, the model was efficiently deployed to the Qt client in the open neural network exchange (ONNX) format, achieving real-time, edge-side inference. [Results and Discussions] Experimental results showed that the proposed Co-ACRN model achieved an overall accuracy of 96.93% on the test set in the three-class mango freshness detection task, with its performance significantly surpassing that of various mainstream baselines and advanced temporal multimodal fusion models, such as modality-invariant and specific-representations for multimodal sentiment analysis (MISA), recurrent attended variation embedding network (RAVEN), multimodal transformer (MulT), and heterogeneous hierarchical message passing network (HHMPN). To verify the rationale of the model design, two sets of ablation experiments were conducted. The input-based ablation study decisively proved that the combination of "time-series information + multimodal information" is a necessary prerequisite for accurate detection, as any model relying on unimodal or static information exhibited significant performance bottlenecks. The architecture-based ablation study further confirmed the superiority of the proposed "dual-attention" system; compared to a backbone network without any attention mechanism, its accuracy was improved by more than five percentage points, and the recall rate for the critical "spoiled" category was as high as 99.16%. An in-depth analysis of the confusion matrix revealed that the vast majority of the model's errors occurred between adjacent categories with the most similar physical states, with no serious cross-category misclassifications, demonstrating its strong robustness. After being deployed on the client side, the system's single diagnosis time was less than 2 s, verifying the solution's combination of high accuracy and real-time performance. [Conclusions] The developed online detection system and Co-ACRN model successfully enabled the real-time, accurate, and non-destructive intelligent detection of fruit and vegetable freshness. The research findings indicate that by combining advanced co-attention and self-attention mechanisms, the fusion challenges of complex multimodal temporal data can be effectively solved. In summary, this study provides a complete solution that combines theoretical innovation with engineering practicality for the online and intelligent detection of distributed fruit and vegetable freshness, and paves new paths for the development of this field in both theory and practice.

Obstacle Avoidance Control Method of Electric Skid-Steering Chassis Based on Fuzzy Logic Control |

LI Lei, SHE Xiaoming, TANG Xinglong, ZHANG Tao, DONG Jiwei, GU Yuchuan, ZHOU Xiaohui, FENG Wei, YANG Qinghui

2026, 8(1): 213-225. doi:10.12133/j.smartag.SA202408003

Asbtract ( 701 )

HTML ( 13)

PDF (2316KB) ( 785 )

Figures and Tables | References | Related Articles | Metrics

[Objective] Trajectory tracking and obstacle avoidance control are important components of autonomous driving chassis, but most current studies treat these two issues as two independent tasks, which will cause the chassis to stop trajectory tracking when facing an obstacle, and then implement trajectory tracking again after completing obstacle avoidance. If the distance from the reference path after obstacle avoidance is too far, the subsequent tracking performance will be affected. There are also some studies on trajectory tracking and obstacle avoidance at the same time, but these studies are either not smooth enough and prone to chatter, or the control system is too complex. Therefore, a simple algorithm is proposed that can simultaneously implement trajectory tracking and obstacle avoidance control of the chassis in this research. [Methods] First, the kinematic model and kinematic error model of the chassis were designed. Since skid-steering was adopted, the kinematic model of the chassis was simplified to a two-wheel differential rotation robot model when designing the mathematical model. Secondly, the Takagi-Sugeno (T-S) fuzzy controller of the chassis was designed. Since the error model of the chassis was designed in advance, the T-S fuzzy model of the chassis could be designed. Based on the T-S model, a T-S fuzzy controller was designed using the parallel distributed compensation (PDC) algorithm. The linear quadratic regulator (LQR) controller was used as the state feedback controller of each fuzzy subsystem in the T-S fuzzy controller to form a global T-S fuzzy controller, which could realize the trajectory tracking function of the chassis when there were no obstacles. Secondly, the obstacle avoidance controller of the chassis was designed. A new $L Q R o b s$ controller was designed in the global open-loop system to generate the reference trajectory to avoid obstacles. When the system detects an obstacle in the environment, the $L Q R o b s$ controller starts working, and generates a new path by judging the distance between the obstacle and the chassis, so that the chassis could avoid the obstacle. When the chassis bypassed the obstacle, the $L Q R o b s$ controller stopped working. In order to better realize the obstacle avoidance function, a fuzzy controller was designed to adjust the gain matrices Q and R of the $L Q R o b s$ controller in real time. Then, in order to realize trajectory tracking and obstacle avoidance controlled at the same time, a fuzzy fusion controller was designed to combine the two controllers to form the final chassis input, and the Mamdani fuzzy controller was selected to achieve it. Finally, the method was simulated and experimental tested. The simulation test used joint simulation test used MATLAB-Simulink and the experiments based on the self-developed electric multi-functional chassis were conducted. [Results and Discussions] The simulation results showed that when there were no obstacles, the control method could achieve stable trajectory tracking in the reference path composed of straight lines and curves. When there were obstacles, the vehicle could avoid them smoothly and quickly converge to the reference trajectory. When facing obstacles, the designed fuzzy logic $L Q R o b s$ controller could adaptively change the controller gain matrix according to the vehicle's speed and the distance between the current obstacles to achieve rapid convergence. The experimental results showed that when there were no obstacles, the chassis could use the T-S fuzzy controller to achieve stable tracking of the reference trajectory, and the average errors in the lateral and longitudinal directions of the entire tracking process were 0.041 and 0.052 m, respectively. When facing obstacles, the T-S fuzzy controller and the $L Q R o b s$ controller realized the obstacle avoidance and tracking control of the chassis through joint control. The fuzzy controller was used to adjust the gain matrix of the $L Q R o b s$ controller in real time, and the tracking error was reduced by 33.9% compared with the controller with a fixed gain matrix. [Conclusions] The control system can simultaneously realize the trajectory tracking and obstacle avoidance control of the chassis, can quickly converge the tracking error to zero, and achieve smooth obstacle avoidance control. Although the control method proposed is simple and efficient, and the tracking and obstacle avoidance effects are significantly improved, the control method can only handle static obstacles on the reference path at present, and subsequent research will focus on dynamic obstacles.

Multi-Machine Collaborative Operation Scheduling and Planning Method Based on Improved Genetic Algorithm |

ZHU Tianwen, WANG Xu, ZHANG Bo, DU Xintong, WU Chundu

2026, 8(1): 226-236. doi:10.12133/j.smartag.SA202508010

Asbtract ( 475 )

HTML ( 23)

PDF (3016KB) ( 45 )

Figures and Tables | References | Related Articles | Metrics

[Objective] Traditional harvesting processes in large-scale farms still suffer from low scheduling efficiency, uneven workload distribution, and suboptimal path planning, which hinder the realization of intelligent and efficient agricultural production. Multi-machine collaborative operation scheduling and planning has become key technologies in intelligent farming management, aiming to optimize task allocation and path planning among multiple harvesters under time window and workload balance constraints. However, such problems belong to complex combinatorial optimization categories characterized by high dimensionality and nonlinearity. Conventional genetic algorithms (GA) often exhibit premature convergence and weak local search capabilities, resulting in suboptimal scheduling schemes. To address these challenges, this study focused on the collaborative harvesting operations of multiple combine harvesters across several fields and proposed an improved multi-traveling salesman problem genetic algorithm (IMTSP_GA) for integrated multi-machine scheduling and path planning. [Methods] A multi-machine cooperative scheduling model was constructed with the objective of minimizing the total operational time of all harvesters while considering time window and load-balancing constraints. The problem was modeled as a multi-traveling salesman problem (MTSP), in which each harvester was regarded as a traveling salesman responsible for a subset of field tasks. To solve the model, the proposed IMTSP_GA adopted a two-layer chromosome encoding structure: The first layer represented the visiting sequence of all task units, and the second layer defined the segmentation positions that allocated tasks to different machines, thereby forming feasible multi-harvester operation routes. To ensure both initial solution quality and population diversity, a hybrid initialization strategy combining sequential and random initialization was designed. Furthermore, a Q-learning-based adaptive mutation mechanism was introduced into the genetic operation process. By constructing a state–action–reward model based on the variation trend of fitness values, the algorithm dynamically selected mutation operators according to their historical performance, thus balancing global exploration and local exploitation. The overall process included chromosome encoding, fitness evaluation, group-based selection, crossover and mutation operations, and Q-learning-driven adaptive control. Based on the optimized scheduling scheme, the full-path planning for each harvester was divided into two stages: (1) in-field path planning, which used an internal spiral coverage method to reduce turning frequency and non-working time; and (2) road network path planning, which employed the Dijkstra algorithm to obtain globally shortest travel routes between fields. [Results and Discussions] A total of 25 farmlands were divided into 49 task units, and four John Deere 3588 harvesters were used for the simulation. Comparative experiments were performed among IMTSP_GA, standard GA, particle swarm optimization (PSO), and ant colony optimization (ACO). The results showed that the IMTSP_GA significantly outperformed other algorithms in terms of total operation time, convergence speed, and computational efficiency. Specifically, the total operational time was reduced by 4.48%, 5.32%, and 9.87% compared with GA, PSO, and ACO, respectively. The average runtime was 5.82 s, which was substantially shorter than that of the GA (11.55 s) and PSO (10.70 s). The algorithm exhibited fast early convergence and effectively avoided premature stagnation. To further evaluate generalization capability, five classical traveling salesman problem (TSP) datasets, Berlin52, Eil76, Bier127, CH150, and KroB200, were tested. IMTSP_GA consistently achieved superior average solutions and shorter runtimes across all datasets, confirming its robustness and adaptability to different problem scales and complexities. Finally, full-process path planning was visualized based on the optimized scheduling results. The generated harvester routes were continuous and compact, ensuring reasonable task allocation and efficient transitions between fields, thereby validating the effectiveness of the proposed model. [Conclusions] By integrating a Q-learning-based adaptive mutation mechanism, IMTSP_GA autonomously selects effective mutation strategies to enhance search performance and convergence stability. Meanwhile, the hybrid initialization strategy maintains population diversity and improves the quality of initial solutions. IMTSP_GA surpasses traditional GA, PSO, and ACO in solution quality, convergence performance, and computational efficiency. The method effectively reduces total operation time, optimizes harvester task allocation, and improves the coordination and efficiency of multi-machine operations. In future work, the research will be extended to more complex scenarios involving multi-region cooperation, task prioritization, and dynamic environmental factors. Reinforcement learning and online optimization techniques will be incorporated to achieve real-time scheduling and intelligent decision-making, thereby enhancing the adaptability and engineering applicability of the proposed method in large-scale intelligent agricultural systems.

Table of Content