Smart Agriculture

Select

Lightweight Detection Method for Pepper Leaf Diseases and Pests Based on Improved YOLOv12s

YAO Xiaotong, QU Shaoye

Smart Agriculture 2026, 8 (1): 1-14. DOI: 10.12133/j.smartag.SA202506005

Abstract （2412）

HTML （178）

PDF（pc）（2085KB）（177）

Save

[Objective] Pepper cultivation frequently faces challenges from diseases and pests, and early detection is critical for reducing yield losses. However, existing detection models often suffer from limitations such as insufficient feature extraction for subtle lesions, loss of edge information due to complex backgrounds, and high missed detection rates for small lesions. To address these issues, the YOLO-MDFR (You Only Look Once), a lightweight detection algorithm was proposed based on an enhanced YOLOv12s, specifically designed for accurate identification of pepper leaf diseases and pests in complex natural environments. [Methods] The dataset was established in the primary pepper cultivation zone of Gangu county, Tianshui city, Gansu province. The cultivated variety was the locally dominant Capsicum annuum L. var. conoides (Mill.). Data collection was conducted from March 15 to May 20, 2024. The collected samples included four categories of pepper leaves: healthy leaves, leaves damaged by thrips, leaves infected with tobacco mosaic virus exhibiting yellowing symptoms, and leaves affected by bacterial leaf spot. First, the original YOLOv12s backbone was replaced with an improved MobileNetV4 architecture to enhance lightweight performance while preserving feature extraction capability. Specifically, the original 5×5 standard convolutions in the bottleneck layers of MobileNetV4 were substituted with two sequential 3×3 depthwise separable convolutions. This design was based on the principle that two 3×3 convolutions achieve an equivalent receptive field (matching the 5×5 coverage) while reducing parameter count—depthwise separable convolutions further decompose spatial and channel convolution, minimizing redundant computations. Second, a novel dimensional frequency reciprocal attention mixing transformer (D-F-Ramit) module was introduced to enhance sensitivity to lesion boundaries and fine-grained textures. The module first converted feature maps from the spatial domain to the frequency domain using discrete cosine transform (DCT), capturing high-frequency components often lost in spatial-only attention. It then integrated three parallel branches: channel attention, spatial attention, and frequency-domain attention. Finally, a residual aggregation gate-controlled convolution (RAGConv) module was developed for the neck network. This module included a residual aggregation path to collect multi-layer feature information and a gate control unit that dynamically weighted feature components based on their relevance. The residual structure provided a direct gradient propagation path, alleviating gradient vanishing during backpropagation and ensuring efficient information transfer during feature fusion. A systematic experimental framework was established to comprehensively evaluate model performance: (1) Ablation studies were conducted using a controlled variable approach to verify the individual contributions of the improved MobileNetV4, D-F-Ramit, and RAGConv modules; (2) Lesion scale sensitivity analysis assessed detection performance across different lesion sizes, with emphasis on small-spot recognition; (3) Resolution impact analysis evaluated five common input resolutions (320×320–736×736) to explore the trade-offs among accuracy, speed, and computational efficiency; and (4) Embedded deployment validation involved model quantization and implementation on the Rockchip RK3588 platform to measure inference speed and power consumption on edge devices. [Results and Discussions] The proposed YOLO-MDFR achieved an mAP@0.5 of 95.6% on this dataset. Compared to YOLOv12s, it improved accuracy by 2.0%, reduced parameters by 61.5%, and lowered computational complexity by 68.5%. Real-time testing showed 43.4 f/s on an NVIDIA RTX 4060 GPU (CUDA 12.2) and 22.8 f/s on a Rockchip RK3588 embedded platform with only 3.5 W power consumption—suitable for battery-powered field devices. Lesion-scale analysis revealed 33.5% accuracy for <16×16 pixel lesions critical for early detection. Confusion matrix evaluation reduced misclassification, bacterial leaf spot/thrips damage misrates fell from 5.8% to 2.1%, and tobacco mosaic virus/healthy leaves from 3.2% to 1.5%, resulting in an overall 2.3% misrate. Experiments across varying input resolutions revealed a clear performance–resolution trade-off. As resolution increased from 320×320 to 736×736, mAP rose from 89.5% to 96.2%, showing diminishing returns beyond 512×512. Concurrently, computational cost grew roughly quadratically, reducing inference speed from 65.2 f/s to 35.1 f/s. [Conclusions] This study presents YOLO-MDFR, a lightweight detection model for identifying pepper leaf diseases and pests under complex natural conditions. By integrating an improved MobileNetV4 backbone, a multi-dimensional frequency reciprocal attention mixing transformer (D-F-Ramit), and a residual aggregation gate-controlled convolution (RAGConv) module, YOLO-MDFR outperforms mainstream detection models in both accuracy and efficiency. Systematic deployment experiments yielded optimized configurations for different application scenarios. Despite its strong performance, the model shows limitations in robustness under extreme lighting, generalization to emerging diseases, and detection of small targets under occlusion. Future work will address these issues through ambient light data fusion, domain adaptation with semi-supervised learning, and binocular vision integration.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Tea Leaf Disease Diagnosis Based on Improved Lightweight U-Net3+

HU Yumeng, GUAN Feifan, XIE Dongchen, MA Ping, YU Youben, ZHOU Jie, NIE Yanming, HUANG Lüwen

Smart Agriculture 2026, 8 (1): 15-27. DOI: 10.12133/j.smartag.SA202507010

Abstract （1366）

HTML （20）

PDF（pc）（1627KB）（62）

Save

[Objective] Leaf diseases significantly affect both the yield and quality of tea throughout the year. To address the issue of inadequate segmentation finesse in the current tea spot segmentation models, a novel diagnosis of the severity of tea spots was proposed in this research, designated as MDC-U-Net3+, to enhance segmentation accuracy on the base framework of U-Net3+. [Methods] Multi-scale feature fusion module (MSFFM) was incorporated into the backbone network of U-Net3+ to obtain feature information across multiple receptive fields of diseased spots, thereby reducing the loss of features within the encoder. Dual multi-scale attention (DMSA) was incorporated into the skip connection process to mitigate the segmentation boundary ambiguity issue. This integration facilitates the comprehensive fusion of fine-grained and coarse-grained semantic information at full scale. Furthermore, the segmented mask image was subjected to conditional random fields (CRF) to enhance the optimization of the segmentation results [Results and Discussions] The improved model MDC-U-Net3+ achieved a mean pixel accuracy (mPA) of 94.92%, accompanied by a mean Intersection over Union (mIoU) ratio of 90.9%. When compared to the mPA and mIoU of U-Net3+, MDC-U-Net3+ model showed improvements of 1.85 and 2.12 percentage points, respectively. These results illustrated a more effective segmentation performance than that achieved by other classical semantic segmentation models. [Conclusions] The methodology presented herein could provide data support for automated disease detection and precise medication, consequently reducing the losses associated with tea diseases.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Rice Disease Identification Method Based on Improved MobileViT Model and System Development

LIU Xiaojun, WU Qian, SUN Chuanliang, QI Chao, ZHANG Gufeng, LEI Tianjie, LIANG Wanjie

Smart Agriculture 2026, 8 (1): 28-39. DOI: 10.12133/j.smartag.SA202507043

Abstract （1480）

HTML （30）

PDF（pc）（3484KB）（86）

Save

[Objective] Under abiotic stress conditions, rice plants become fragile and susceptible to disease infection. Accurate diagnosis and scientific prevention and control strategies for rice diseases are crucial for the prevention and control of rice diseases, even disasters such as blooding and high temperatures. However, under field natural environmental conditions, the identification of rice diseases is a challenging problem. There are various issues such as complex backgrounds, illumination changes, occlusion, which make it extremely difficult to comprehensively obtain disease information, thus significantly increasing the difficulty of disease identification. This study aims to develop an efficient rice disease recognition model by integrating the efficient channel attention (ECA) mechanism with the MobileViT model, enhancing the accuracy of rice disease identification in the field. Additionally, the rice disease knowledge graph was combined to achieve precise diagnosis and generate scientifically grounded control prescriptions for effective disease management. [Methods] A total of 1 304 raw images of rice diseases were collected from different rice disease investigation and long-term monitoring points in Jiangsu province, at different periods of time, using mobile phones and cameras. 167 disease images from the rice leaf disease image samples dataset were used to supplement the dataset. The raw images were accurately classified and preprocessed under the guidance of plant protection experts. A dataset containing 1 471 original images was constructed that includes seven types of rice diseases: bacterial leaf blight, false smut, leaf blast, bakanae disease, heart rot, grain discoloration, and panicle blast. The dataset was partitioned into training, validation, and test sets following a 7:1.5:1.5 ratio. Data augmentation techniques were applied exclusively to the training and validation sets to enhance sample diversity, while the test set remained unaugmented to preserve its independence for unbiased model evaluation. Post-augmentation, the total image count increased to 7 735. A novel rice disease recognition model was established by integrating the efficient channel attention (ECA) module into the MobileViT model. The recognition model architecture was optimized by improving convolutional structures, reconstructing Transformer encoding blocks, replacing activation function using SiLU. To verify the performance of the model, cross validation and ablation experiments were conducted. After establishing a highly accurate recognition model, the recognition model was combined with the rice disease knowledge graph to achieve accurate diagnosis of rice diseases and generate scientific prevention and control strategies. Finally, an intelligent rice disease diagnostic system was developed using the Flask framework and cloud computing technologies. [Results and Discussions] The results of the ablation study revealed that the model, which combined convolutional layer optimization, Transformer block reconstruction, and the integration of the ECA module, got outstanding performance.The overall precision, F₁-Score and recall rate achieved 97.27%, 97.32%, and 97.46%, respectively. In terms of accuracy, the improved model increased to 97.25%, representing an improvement of 2.3% over the original model (94.95%). To further verify the effectiveness of the improved model, various mainstream models such as Swin Transformer, TinyVit, and ConvNeXt were compared with the proposed model.The experimental results showed that the improved model outperformed the suboptimal model (TinyVit) by 0.92, 1.43, 0.95, 1.32 percent points in overall accuracy, F₁-Score and recall rate, respectively. Moreover, the improved model showed significant advantages in terms of floating-point operations, model size, and parameter count, with a parameter count of only 6.02 MB, making it more suitable for deployment on hardware-constrained devices. Analysis of the confusion matrix and heatmap visualizations revealed that the enhanced model achieved recognition accuracy improvements of 0.6, 0.3, 0.3, and 0.6 percentage points for bacterial leaf blight, heart rot, grain discoloration, and panicle blast, respectively. The integrated system, combining this model with the knowledge graph, demonstrated significantly enhanced accuracy in disease identification and diagnosis. Meanwhile, the disease prevention and control strategies were generated to guide rice disease prevention and control. During field deployment, the rice disease diagnosis system achieved an accuracy rate as high as 98%, with an average response time of 181 ms, demonstrating reliable real-time performance and stability. [Conclusions] By integrating ECA module and reconstructing Transformer encoding blocks, the MobileViT model achieved noticeable improvements in precision, recall and F₁ score, while effectively reducing computational costs, leading to more efficient recognition capabilities of rice diseases in complex field environments. The application of the rice disease intelligent diagnosis system revealed that the system could achieve accurate rice disease diagnosis results, and generate disease prevention and control strategies for guide rice disease prevention and control. This method could effectively improve the prevention and control efficiency of rice diseases, providing technical support for improving the quality, efficiency, digitization and intelligence of rice production.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Low-rank Adaptation Method for Fine-tuning Plant Disease Recognition Models

HUANG Jinqing, YE Jin, HU Huilin, YANG Jihui, LAN Wei, ZHANG Yanqing

Smart Agriculture 2026, 8 (1): 40-51. DOI: 10.12133/j.smartag.SA202504003

Abstract （1462）

HTML （65）

PDF（pc）（2250KB）（98）

Save

[Objective] When deep learning is applied to plant disease recognition tasks, model fine-tuning faces significant challenges, including limited computational resources and high parameter update overhead. Although traditional low-rank adaptation (LoRA) methods effectively reduce parameter overhead, their strategy of assigning a uniform, fixed rank to all layers often overlooks the varying importance of different layers. This approach may still lead to constrained optimization in critical layers or resource waste in less significant ones. To address this limitation, a dynamic rank allocation (DRA) algorithm is proposed in this research. The DRA algorithm is designed to evaluate and adjust the required parameter resources for each layer during training, enhance the accuracy of plant disease classification models while more efficiently balancing computational resources. [Methods] Public datasets of the Wheat Plant Diseases Dataset and the Plants Disease Dataset were utilized in the experiments. The Wheat Plant Diseases Dataset comprised 13 104 images covering 15 types of wheat diseases such as black rust and fusarium head blight, while the Plants Disease Dataset included 37 505 images of 26 types of plant diseases such as algal leaf spot, corn rust, and bacterial spot of tomato. These datasets were captured under varied lighting, different backgrounds, diverse angles, and at various stages of plant growth. A cross-layer feature similarity metric based on centred kernel alignment (CKA) was introduced to quantify the representational correlation between different layers. Concurrently, a correction factor was constructed based on gradient information and activation intensity to measure the direct impact of each layer on the loss function. These two metrics were then fused using a weighted harmonic mean to generate a comprehensive importance score, which was subsequently used for the initial rank allocation. Furthermore, considering the effect of feature representation changes during training, a stability-triggered adaptive rank update strategy rank re-allocation (RRA) was proposed. This strategy monitored the average parameter change of the low-rank adapters during the training process to determine the convergence state. When this change fell below a specific threshold, the low-rank matrices were merged into the original weights, and the rank allocation table was then re-calculated and updated. This process ensured that more resources were allocated to critical layers, thereby achieving an optimized allocation of parameter resources across different layers. [Results and Discussions] Tests on four models (AlexNet, MobileNetV2, RegNetY, and ConvNeXt) indicated that, compared to full-parameter fine-tuning, the proposed method reduced resource consumption to 0.42%, 2.46%, 3.56%, and 1.25%, respectively, while maintaining a comparable average accuracy. The RRA strategy demonstrated continuous parameter optimization throughout the model's training. On the ConvNeXt model, the trainable parameters on the plants disease dataset were progressively reduced from 18.34 to 9.26 M, a reduction of nearly 50%. In comparison with the standard LoRA method (R=16), the method reduced the accuracy by 0.38, 0.40 and 0.05 percentage points on the wheat plant diseases dataset for AlexNet, MobileNetV2, and RegNetY, respectively, while resource consumption was reduced by 59.3%, 87.4% and 50.5%. Robustness was tested by applying perturbations to the test set, including Gaussian noise, random cropping, color jitter, and random rotation. The results showed that the model was most affected by color jitter and random rotation on the Plants Disease Dataset, with accuracy decreasing by 6.02 and 5.11 percentage points, respectively. On the wheat plant diseases dataset, the model was more sensitive to random cropping and random rotation, with accuracy decreasing by 4.33 and 4.40 percentage points, respectively; the overall performance degradation remained within an acceptable range. When compared to other advanced low-rank methods such as AdaLoRA and DyLoRA under the same parameter budget, the DRA method exhibited higher accuracy. On the RegNetY model, the DRA method achieved an accuracy of 90.96% on the Plants Disease Dataset, which was 0.55 percentage points higher than AdaLoRA and 0.94 percentage points higher than DyLoRA. In terms of training efficiency on the Plants Disease Dataset, the DRA method required 43.5 minutes to reach its peak validation accuracy of 89.84%, whereas AdaLoRA required 52.3 minutes, representing a training time increase of approximately 20.23%. Regarding inference flexibility, the DyLoRA method was designed to generate a universal model capable of adapting to multiple rank configurations after a single training run, allowing for dynamic rank switching during inference based on hardware or latency requirements. The DRA method, however, did not possess this inference-time flexibility. It was focused on converging to a single, high-performance rank configuration for a specific task during the training phase. [Conclusions] The low-rank adaptive fine-tuning method proposed in this research significantly reduced the number of model training parameters while ensuring plant disease recognition accuracy. Compared to traditional fixed-rank LoRA and other advanced low-rank optimization methods, it demonstrated distinct advantages, providing an effective pathway for efficient model deployment on resource-constrained devices.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Intelligent Q&A Method for Crop Diseases and Pests Using LLM Augmented by Adaptive Hybrid Retrieval

YANG Jun, YANG Wanxia, YANG Sen, HE Liang, ZHANG Di

Smart Agriculture 2026, 8 (1): 52-61. DOI: 10.12133/j.smartag.SA202506026

Abstract （1769）

HTML （43）

PDF（pc）（1647KB）（135）

Save

[Objective] Extracting valuable knowledge from vast amounts of dispersed, heterogeneous, and unstructured agricultural big data, correlating and structuring it, and enhancing large models to form intelligent question-answering systems enables the effective delivery of services to all in agriculture. This approach can rapidly advance the scientific and precision-based development of agricultural production. Existing agricultural Q&A systems lack enough semantic understanding of complex symptoms, while general-purpose large language models (LLMs) produce factual hallucinations due to incomplete training data coverage. The research aims to address the issues of insufficient scale and low quality in the construction of knowledge bases in the agricultural field. [Methods] First, disease and pest data were collected along for five typical crops: wheat, rice, corn, potatoes, and cotton. Using manual verification methods, outliers were precisely identified and removed, ultimately yielding 87 901 unstructured data entries. Then, a few-shot learning model was employed to extract entities defined in the pattern layer, and these entities were aligned with the semantic vectors of Bert and LLM prompt engineering, ultimately yielding a triplet knowledge base of 916 239 entries for knowledge retrieval. A knowledge retrieval-augmented LLM approach for intelligent Q&A on crop diseases and pests was proposed, specifically the adaptive hybrid retrieval-augmented generation (AHR-RAG) approach. Firstly, an overlapping mechanism was introduced during fixed-length segmentation to mitigate semantic fragmentation. Simultaneously, vector semantic similarity was used to match highly related text blocks with the topic for optimization and storage. Then, single-hop and multi-hop retrieval were designed based on the complexity of the problem. Single-hop retrieval used the BM25 algorithm to match information extracted from the query with document content in the Elasticsearch index, feeding the results into the LLM to enhance answer generation. Multi-hop retrieval first converted user queries into structured conditions and semantic vector representations. Results retrieved from different knowledge bases were then fused using reciprocal rank fusion (RRF) and fed into the LLM. [Results and Discussions] The proposed method was experimentally compared with multiple baseline approaches, including different query types and complexity queries. The results demonstrated that the proposed method achieved accuracy and F₁ improvements of 0.193 and 0.170, respectively, on the Qwen1.5-7B-Chat model. Compared to the improved methods Self-RAG and Adaptive-RAG, AHR-RAG maintained low response times while achieving F₁improvements of 0.05 and 0.021, respectively, with an accuracy as high as 0.896. For multi-type question-answering tasks, compared to the Naive-RAG method that relied solely on prior knowledge, the AHR-RAG approach achieved accuracy improvements of 0.231, 0.123, and 0.157 for comparison, judgment and selection query types, respectively. For parsing complex semantics, AHR-RAG also demonstrated significant advantages. In single-hop queries, its accuracy reached 0.921, representing a 0.029 improvement over Adaptive-RAG. In multi-hop query scenarios, its accuracy reached 0.748, achieving gains of 0.082 and 0.059 over Self-RAG and Adaptive-RAG respectively. In retrieval-augmented generation, AHR-RAG achieved a 0.013 increase in accuracy and a 0.009 improvement in F₁ by optimizing prompt strategies, compared to directly feeding retrieval results to the model for output. [Conclusions] This research demonstrates strong adaptability to diverse query types and excels at reasoning complex queries such as multi-hop searches. It delivers significant advantages in answer generation accuracy, relevance, and comprehensiveness, producing responses with enhanced logical coherence and richer content. Future work will explore the integration of multimodal knowledge bases.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Multi-Scale Tea Leaf Disease Detection Method Based on Improved YOLOv11n

XIAO Ruihong, TAN Lixin, WANG Rifeng, SONG Min, HU Chengxi

Smart Agriculture 2026, 8 (1): 62-71. DOI: 10.12133/j.smartag.SA202509014

Abstract （1617）

HTML （52）

PDF（pc）（1532KB）（62）

Save

[Objective] Preventing and containing leaf diseases is a critical component of tea production, and accurate identification and localization of symptoms are essential for modern, automated plantation management. Field inspection in tea gardens poses distinctive challenges for vision-based detection algorithms: targets appeared at widely varying scales and morphologies under complex backgrounds and unfixed acquisition distances, which easily misled detectors. Models trained on standardized datasets with uniform distance and background often underperform, leading to false alarms and missed detections. To support method development under realistic constraints, YOLO-SADMFA (You Only Look Once-Switchable Atrous Dynamic Multi-scale Frequency-aware Adaptive), a detector based on the YOLOv11n backbone was proposed. The architecture aims to preserve fine details during repeated re-sampling (down- and up-sampling), strengthen modeling of lesions at varying scales, and refine multi-scale feature fusion. [Methods] The proposed architecture incorporated additional convolutional, feature extraction, upsampling, and detection head stages to better handle multi-scale representations, and a DMF-Upsample (Dynamic Multi-scale Frequency-aware Upsample) module that performed upsampling through multi-scale feature analysis and dynamic frequency adjustment fusion was introduced. This module enabled efficient multi-scale feature integration while effectively mitigating information loss during up- and down-sampling. Concretely, the DMF-Upsample analyzed multi-frequency responses from adjacent pyramid levels and fused them with dynamically learned frequency-selective weights, which preserved high-frequency lesion boundaries and textures while retaining low-frequency contextual structure such as leaf contours and global shading. A lightweight gating mechanism estimates per-location and per-channel coefficients to regulate the contribution of different bands, and a residual bypass preserved identity information to further reduce aliasing and oversmoothing introduced by repeated resampling. Furthermore, the baseline C3k2 block was replaced with a switchable atrous convolution (SAConv) module, which enhanced multi-scale feature capture by combining outputs from different dilation rates and incorporates a weight locking mechanism to improve model stability and performance. In practice, the SAConv aggregated parallel atrous branched at multiple dilation factors through learned coefficients under weight locking, which expanded the effective receptive field without sacrificing spatial resolution and suppressed gridding artifacts, while incurring modest parameter overhead. Lastly, an adaptive spatial feature fusion (ASFF) mechanism was integrated into the detection head, forming an ASFF-Head that learned spatially varying fusion weights across different feature scales, effectively filters conflicting information, and strengthens the model's robustness and overall detection accuracy. Together, these components formed a deeper yet efficient multi-scale pathway suited to complex field scenes. [Results and Discussions] Compared with the original YOLOv11n model, YOLO-SADMFA improved precision, recall, and mAP by 4.4, 8.4, and 3.7 percentage points, respectively, indicating more reliable identification and localization across diverse field scenes. The detector was particularly effective for multi-scale targets where the lesion area occupied approximately 10%－65% of the image, reflecting the variability introduced by unfixed acquisition distance during tea garden patrols. Under low illumination and in complex backgrounds with occlusions and clutter, it maintained stable performance, reduced both missed detections and false alarms, and effectively distinguished disease categories with similar morphology and color. On edge computing devices, it sustained about 161 f/s, which met real-time requirements for mobile inspection robots and portable systems. These outcomes demonstrated strengthened robustness to background interference and improved sensitivity at extreme scales, which was consistent with practical demands where the acquisition distance was not fixed. From an ablation perspective, DMF-Upsample preserved high-frequency lesion boundaries while retaining low-frequency structural context after resampling, SAConv expanded receptive fields through multi-dilation aggregation under a weight-locking mechanism, and the ASFF-Head mitigated conflicts among feature pyramids. Their combination yielded cumulative gains in stability and accuracy. Qualitative analyses further supported the quantitative results: Boundary localization improved for small, speckled lesions, large blotches were captured with fewer spurious edges, and distractors such as veins, shadows, and soil textures were less frequently misclassified, confirming the benefits of dynamic multi-scale frequency-aware fusion and adaptive spatial weighting in real field conditions. [Conclusions] The proposed YOLO-SADMFA effectively addressed the multi-scale disease detection challenge in complex tea garden environments, where acquisition distance was not fixed, lesion morphology and color were diverse, and cluttered backgrounds easily caused misjudgments and omissions. It significantly improved detection accuracy and robustness relative to the original YOLOv11n model across a wide range of target scales, and it maintained stable performance under low illumination and complex backgrounds typical of field inspections. It provided reliable technical support for automated tea leaf disease inspection systems by enabling accurate localization and identification of lesions in real operating conditions and by sustaining real-time inference on edge devices suitable for patrol-style deployment.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Self-Supervised Adaptive Multimodal Feature Fusion Recognition of Crop Diseases and Pests

YE Penglin, MIN Chao, GOU Liangjie, WANG Pengcheng, HUANG Xiaopeng, LI Xin, MENG Yuping

Smart Agriculture 2026, 8 (1): 72-84. DOI: 10.12133/j.smartag.SA202509032

Abstract （1484）

HTML （34）

PDF（pc）（4722KB）（70）

Save

[Objective] Crop diseases and pests are significant factors restricting global agricultural production. Traditional intelligent recognition technologies predominantly rely on single-modal image data processed by convolutional neural networks (CNNs) or Transformers. However, in complex natural environments, these methods often suffer from insufficient information utilization and limited robustness due to the lack of semantic guidance. Although emerging multimodal approaches like CLIP have introduced textual information, they typically rely on shallow feature alignment in the embedding space without achieving deep semantic interaction or effective feature fusion. Furthermore, the asymmetry between the quantity of image samples and text labels during training poses a challenge for effective cross-modal learning. In this study, a self-supervised adaptive multimodal feature fusion recognition (SAFusion-CLIP) method is proposed, aiming to significantly enhance classification accuracy and model generalization in fine-grained diseases and pests recognition tasks. [Methods] A comprehensive recognition framework was constructed, integrating four key components to achieve deep fusion of visual and textual features. First, prompt engineering was conducted by utilizing large language models (LLMs) combined with authoritative agricultural guides to transform simple category labels into fine-grained pathological semantic descriptions. These descriptions encapsulated morphological details, color gradients, and texture features, with quality verified by BERTScore and ROUGE-L metrics. Second, a cross-modal balanced alignment module was designed to resolve the problem of sample asymmetry between image batches and fixed text labels. This module employed a dot-product attention mechanism to calculate the correlation between image and text projections, applying Softmax normalization to dynamically align image features with their corresponding textual representations. Third, an adaptive fusion mechanism was employed to achieve deep semantic interaction. A gating unit based on the Sigmoid function was designed to calculate a gate value, which dynamically allocated weights to image and text features, allowing the model to adaptively integrate complementary information from both modalities. Finally, a self-supervised feature reconstruction task was introduced to enhance the robustness of feature representation. A simple decoder was utilized to reconstruct the original image and text embeddings from the fused features, and the model was optimized using a composite objective function combining image-text contrastive loss, mean squared error reconstruction loss, and weighted cross-entropy classification loss. [Results and Discussions] Extensive experiments were conducted on the standard PlantVillage dataset, which includes 39 categories covering 14 crop species. The proposed SAFusion-CLIP model achieved a classification accuracy of 99.67%, with precision, recall, and F₁-Score all exceeding 99.00%. Comparative analysis demonstrated that the proposed method significantly outperformed mainstream single-modal and baseline multimodal models, ResNet50 (96.51%), Swin-Transformer (97.48%), and baseline CLIP (98.23%), respectively. Visualization analysis using Gradient-weighted Class Activation Mapping (Grad-CAM) indicated that, unlike single-modal models which were susceptible to background noise or non-specific physical damage, the SAFusion-CLIP model focused more precisely on core lesion areas, effectively suppressing background interference. Furthermore, ablation studies confirmed the effectiveness of the proposed modules, showing that the combination of the self-supervised architecture and the adaptive fusion mechanism resulted in a 2.46 percentage points accuracy improvement over the baseline, validating the necessity of deep feature interaction and reconstruction tasks. [Conclusions] By fusing textual semantics with visual features, the SAFusion-CLIP method effectively overcame the limitations of single-modal recognition. The adaptive fusion mechanism ensured deep interaction between modalities, while the self-supervised reconstruction task significantly enhanced the robustness of feature representation. The experimental results verified that this data-driven approach significantly improves accuracy and generalization capabilities in fine-grained crop disease classification tasks, providing a new and effective solution for precision agricultural prevention and control.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Research Progress and Future Prospects of Pig Intelligent Detection Technology

XIAO Deqin, LÜ Yuding, HUANG Yigui, CUAN Kaixuan

Smart Agriculture 2026, 8 (1): 86-103. DOI: 10.12133/j.smartag.SA202507048

Abstract （1437）

HTML （40）

PDF（pc）（3570KB）（91）

Save

[Significance] The pig industry is a key sector of animal husbandry. With the continuous expansion of farming scale, traditional manual inspection methods can no longer meet the demands of modern production in terms of efficiency, accuracy, and animal welfare. In recent years, intelligent pig monitoring technologies based on multi-source data, such as images, depth information, sensors, and sound, have developed rapidly, providing new solutions for health monitoring, behavior recognition, weight assessment, and physiological state management during the farming process. As a crucial foundation for upgrading the pig industry toward intelligent and precise farming, it is of significant value to systematically review the current research status, application progress, and future trends of the technological system. [Progress] This paper focuses on the main research areas in intelligent pig monitoring, systematically summarizes the commonly used data types and their applications in farming scenarios from the perspective of matching data sources with application objectives. First, research based on infrared images mainly focuses on non-contact acquisition of body temperature information, which is used for disease early warning and health monitoring, offering clear advantages in reducing stress responses and increasing monitoring frequency. Second, visible-light images are widely applied in behavior recognition and health analysis, supporting automated identification and quantification of behaviors such as feeding, resting, and aggression, thereby facilitating dynamic understanding of pig herd behavior patterns and changes. Third, depth images and three-dimensional information demonstrate unique value in body measurement extraction and weight estimation, promoting the development of non-contact, continuous weight monitoring. Fourth, wearable sensors enable continuous monitoring of pig's health, lameness risk, and daily behavioral rhythms by recording physiological data such as body temperature, acceleration, and feeding activity in real time. Finally, audio signals, an emerging data type in recent years, have shown potential in monitoring abnormal sounds such as coughing, providing a new approach for the early detection of respiratory diseases. On this basis, this paper further summarizes the research and application of intelligent detection equipment. Current equipment presents a development trend in two aspects: one focuses on single indicators such as body temperature and weight, characterized by precise collection and rapid feedback; the other integrates multiple functions including image acquisition, body temperature detection, behavior recording, and identity recognition through mobile platforms such as inspection robots, enabling full-scenario and all-weather intelligent detection and improving the automation and refinement level of pig farm management. With the growth of industrial demand, various types of equipment are gradually moving from laboratories to commercialization, providing important support for intelligent breeding. [Conclusions and Prospects] Despite the rapid development of intelligent pig detection technology, multiple challenges still exist. At the data level, interference from lighting, occlusion, and noise in different scenarios can affect the stability of detection results; at the hardware level, some equipment suffers from high costs and needs improvement in reliability; at the model level, differences across pig farms, breeds, and growth stages still lead to insufficient adaptability; at the application level, data continuity, system stability, and equipment maintenance costs in large-scale scenarios require further optimization. These factors collectively restrict the large-scale promotion of intelligent detection technology in the industry. Future research directions will exhibit the following common trends: First, achieving contactless operation and multi-scenario adaptability to minimize disturbance to pigs and enhance stability in complex environments. Second, advancing the integration of multimodal data fusion and deep learning to establish stronger correlations among multi-source data such as images, sensors, and audio. Third, developing individualized health and growth models to provide a scientific basis for precision feeding and management.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Research Progress and Prospects of Intelligent Control Technology for Facility Vegetable

WANG Jian, ZHAO Haosen, MA Yue, XING Bin, ZHU Wenying

Smart Agriculture 2026, 8 (1): 104-119. DOI: 10.12133/j.smartag.SA202508003

Abstract （1359）

HTML （52）

PDF（pc）（1311KB）（88）

Save

[Significance] With the advancement of technology and diversified consumer demands, traditional agriculture is gradually transforming towards information and intelligence. Conducting research on intelligent management and control technologies for facility vegetable production is of great significance for improving vegetable yield and quality, ensuring stable market supply, and promoting high-quality development of the vegetable industry. The purpose of this article is to systematically collate the research status, key technologies and application constraints in the field of intelligent management and control of facility vegetables. By analyzing the development trends of environmental regrlation, growth monitoring and precise management, it provides scientific basis, theoretical support and decision-making references for the intelligent upgrading, technological innovation and policy formulation of Chinese facility vegetable industry, so as to boost the high-quality and sustainable development of facility agriculture. [Progress] This paper systematically analyzes the innovative applications of information technologies such as Internet of Things, block chain, and artificial intelligence in critical domains of facility vegetable production information, including precise regulation of the production environment, intelligent cultivation management and smart storage information management. In terms of precise regulation of the production environment, a temperature and humidity model for the optimal growth environment of tomatoes has been established, primarily utilizing Internet of Things technology. This enables precise monitoring and intelligent control of environmental parameters such as temperature, humidity, light, and carbon dioxide concentration within the facility, creating an optimal environment for vegetable growth. In the field of intelligent cultivation management, the integration of intelligent integrated water and fertilizer equipment, agricultural robotic operation systems, and pest and disease control has optimized the whole-process information-based management, effectively improving cultivation efficiency and vegetable quality. The integrated water and fertilizer systems apply Internet on Things technology to coordinate irrigation and fertilization through digital methods. Agricultural robotic operation systems are based on artificial intelligence, encompassing technologies such as machine learning, deep learning, neural networks and image processing. The pest and disease control section highlights the information-based applications in physical control, biological control and chemical control. In terms of smart storage information management, the application of origin storage preservation technology, intelligent classification and sorting systems, as well as traceability information platforms has significantly enhanced the circulation quality and safety assurance level of vegetables. Specifically, the origin storage preservation technology focuses on the development status of pre-cooling preservation, controlled atmosphere preservation, biological preservation and coating preservation. Intelligent grading and sorting technologies are categorized into non-destructive testing for both the external and internal quality of vegetables. The traceability information platform, leveraging blockchain and large model technologies, enables more intelligent management of facility vegetable production. [Conclusions and Prospects] This paper explores the problems encountered in the development of intelligent management and control technology for protected vegetables, including insufficient accuracy and stability of sensors, lagging regulatory decision-making, lack of equipment coordination mechanisms, poor integration of pest and disease control, fragmentation of information in the whole process of storage, difficulty in quality traceability, and lagging risk warning. Corresponding countermeasures and suggestions are proposed as follows: optimization of hardware, multi-technology integration to support precise perception and intelligent regulation, enhancement of equipment coordination and optimization, integration of pest and disease control, and construction of a virtual-real interactive storage management system through the integration of digital twins and metaverse. Finally, the paper prospects the future development direction of facility vegetable in precise control of production environment, cultivation management information, and storage information control.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Advances and Prospects in Body-Size Measurement of Sheep: From 2D Vision to 3D Reconstruction and 2D-3D Fusion

DAI Weijiao, LIANG Yudongchen, ZHOU Yong, YAO Chao, ZHANG Cheng, SONG Yongjian, LI Guoliang, TIAN Fang

Smart Agriculture 2026, 8 (1): 120-147. DOI: 10.12133/j.smartag.SA202507028

Abstract （1566）

HTML （55）

PDF（pc）（4740KB）（69）

Save

[Significance] In alignment with the national germplasm security strategy, current research efforts are accelerating the adoption of precision breeding in sheep. Within the whole-genome selection, accurate phenotyping of body morphometrics is critical for assessing growth performance and breeding value. Traditional manual measurements are inefficient, prone to human error, and may cause stress to sheep, limiting their suitability for precision sheep management. By summarizing the applications of sheep body size measurement technologies and analyzing their development directions, this paper provides theoretical references and practical guidance for the research and application of non contact sheep body size measurement. [Progress] This review synthesizes progress across three principal methodological paradigms: two-dimensional (2D) image-based techniques, three-dimensional (3D) point cloud-based approaches, and integrated 2D-3D fusion systems. 2D methods, employing either handcrafted geometric features or deep learning-based keypoint detector algorithms, are cost-effective and operationally simple but sensitive to variation in imaging conditions and unable to capture critical circumference metrics. 3D point-cloud approaches enable precise reconstruction of full animal morphology, supporting comprehensive body-size acquisition with higher accuracy, yet face challenges including high hardware costs, complex data workflows, and sensitivity to posture variability. Hybrid 2D-3D fusion systems combine semantic richness from RGB imagery with geometric completeness from point clouds. Having been effectively validated in other livestock specise, e.g., cattle and pigs, these fusion systems have demonstrated excellent performance, providing important technical references and practical insights for sheep body size measurement. [Conclusions and Prospects] Firstly, future research should focus on constructing large-scale, high-quality datasets for sheep body size measurement that encompass diverse breeds, growth stages, and environmental conditions, thereby enhancing model robustness and generalization. Secondly, the development of lightweight artificial intelligence models is essential. Techniques such as model compression, quantization, and algorithmic optimization can substantially reduce computational complexity and storage requirements, facilitating deployment in resource-constrained environments. Thirdly, the 3D point cloud processing pipeline should be streamlined to improve the efficiency of data acquisition, filtering, registration, and segmentation, while promoting the integration of low-cost, high-resilience vision systems into practical farming scenarios. Fourthly, specific emphasis should be placed on improving the accuracy of curved-dimensional measurements, such as chest circumference, abdominal circumference, and shank circumference, through advances in pose standardization, refined 3D segmentation strategies, and multi-modal data fusion. Finally, the cross-fertilization of sheep body size measurement technologies with analogous methods for other livestock species offers a promising pathway for mutual learning and collaborative innovation, accelerating the industrialization of automated sheep morphometric systems and supporting the development of intelligent, data-driven pasture management practices.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Greenhouse Temperature and Humidity Prediction Method Based on Adaptive Kalman Filter and GWO-LSTM-Attention

CAI Yuqin, LIU Daming, XU Qin, LI Boyang, LIU Bojie

Smart Agriculture 2026, 8 (1): 148-155. DOI: 10.12133/j.smartag.SA202506033

Abstract （1627）

HTML （18）

PDF（pc）（1482KB）（56）

Save

[Objective] Acquiring valid data is a critical part for establishing accurate greenhouse prediction models. However, simple averaging and weighted averaging are commonly used to process multi-sensor data in current research, but these methods are often ineffective against sensor noise interference. Additionally, greenhouse temperature and humidity exhibit strong coupling characteristics that necessitate coordinated control strategies. Prevailing studies predominantly train separate models for temperature and humidity prediction, which risks generating physically inconsistent results (e.g., simultaneous high temperature and high humidity), using it as the basis for control may lack reliability. Furthermore, the multi-dimensional environmental factor data in greenhouses has the characteristics of large volume and high computational cost. In the training process of the traditional LSTM model, parameters are manually adjusted based on human experience. When dealing with high-dimensional data, the model's convergence is slow and it is prone to getting stuck in local optima. [Methods] To address multi-point data fusion challenges, the traditional Kalman filtering algorithm was improved by dynamically adjusting the process noise covariance (Q) and observation noise covariance (R), while adaptively assigning weights to multiple sensors based on the innovation. This adaptation was achieved by monitoring the innovation sequence—the difference between observed and predicted measurements. Furthermore, the algorithm utilized the innovation covariance to assign adaptive weights to multiple sensors. Sensors with consistently smaller innovations which indicated higher reliability, were assigned greater weights. This mechanism enabled the system to swiftly identify and mitigate the impact of abnormal sensor readings, thereby ensuring robust and accurate fusion of multi-sensor data and providing a reliable foundation for subsequent model training. To address the strong coupling between temperature and humidity and their collaborative control requirements, a multi-output LSTM-attention model was developed for joint temperature-humidity prediction within a unified architecture. This model employed an attention mechanism to adaptively weight critical environmental factors, thereby resolving physical constraint violations inherent in univariate forecasting approaches. The multi-dimensional nature of greenhouse environmental data often leads to high computational costs during model training. In traditional practices, the hyperparameters of LSTM models were often manually tuned based on experience, a process that was not only inefficient but also prone to suboptimal convergence and local optima traps, especially with high-dimensional data. To overcome this limitation, the grey wolf optimizer (GWO) was integrated to automatically perform hyperparameter optimization search and efficiently search for the optimal combination of key hyperparameters, such as the number of hidden units, learning rate, and dropout rate. [Results and Discussions] The adaptive Kalman filtering algorithm proposed achieved mean absolute deviations (MAD) of 1.59 ℃ and 8.64% for multi-point temperature and humidity fusion, respectively. Compared to the traditional Kalman filter algorithm, these represented reductions of 1.24% and 8.57%. The algorithm enabled swift identification of abnormal sensors and effectively mitigated their impact. When utilizing the fusion results of this algorithm as the model training dataset, the R² values for temperature and humidity predictions reached 98.2% and 99.3%, respectively. This constituted an increase of 4.7 and 4.3 percentage points compared to results obtained using the Kalman filter, demonstrating that the algorithm provided a highly reliable data foundation for model training. Furthermore, the GWO-LSTM-Attention model trained on this data yielded root mean square errors (RMSE) of 0.776 8 and 2.056 4 for temperature and humidity prediction, respectively. Compared to the LSTM and LSTM-Attention time-series prediction models, the temperature RMSE was reduced by 15.6% and 6.6%, while the humidity RMSE saw reductions of 29.2% and 5.7%. This reflects the role of the GWO algorithm in enhancing model generalization capability and convergence efficiency. [Conclusions] The proposed adaptive Kalman fusion algorithm effectively integrates multi-sensor data, demonstrating robustness in handling sensor noise, outliers, and non-stationary environmental fluctuations. For predicting multiple greenhouse environmental factors, the developed GWO-LSTM-Attention model provides reliable forecasts across diverse time horizons. This study can provide a highly accurate prediction tool for greenhouse environment control. The combined prediction results could directly support the coordinated control of ventilation and irrigation equipment in the future, thereby reducing energy consumption.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Point Cloud Data-driven Methods for Estimating Maize Leaf Biomass

WU Zhangbin, HE Ning, WU Yandong, GUO Xinyu, WEN Weiliang

Smart Agriculture 2026, 8 (1): 156-166. DOI: 10.12133/j.smartag.SA202509015

Abstract （1422）

HTML （15）

PDF（pc）（2964KB）（31）

Save

[Objective] Maize leaf dry biomass is a key trait that reflects plant morphology, growth vigor, and physiological processes including photosynthetic production. Its dynamic changes can effectively characterize the growth status of maize. Accurate estimation of maize leaf dry biomass is crucial for accurately predicting maize yield and informing production management decisions. Extensive research on crop dry biomass estimation indicates that 3D point cloud data characterizing crop morphological structure, along with features derived therefrom, exhibit an extremely high correlation with crop dry biomass. However, traditional dry biomass prediction studies focus primarily on the population canopy scale, and lack effective prediction methods for dry biomass at the plant and organ scales. Research on non-destructive measurement methods for maize leaf dry biomass, based on 3D point clouds and machine learning, the demand is conducted to address for rapid acquisition of organ-level dry biomass information in maize cultivation and management research. [Methods] Maize leaf point cloud data were acquired using three techniques: Multi-view stereo (MVS), LiDAR scanning, and 3D digitalization (DT). The leaf point clouds underwent preprocessing steps that included plant segmentation, denoising, mesh refinement, and uniform subsampling. Subsequently, morphological traits were extracted from the processed data, including leaf length, leaf area, bounding box dimensions, and the number of points contained within the leaf point clouds. Three machine learning methods: random forest (RF), gradient boosting regression tree (GBRT), and support vector regression (SVR), as well as two deep learning methods: convolutional neural network (CNN) and fully connected neural network (FCNN), were employed for predicting maize leaf dry weight. A point cloud-based maize leaf dry biomass prediction model was subsequently developed. This study utilized the mean squared error reduction method inherent to RF and the cumulative improvement method based on decision tree splits in GBRT to rank and visualize feature importance for optimal models. The resulting rankings were then visualized. Simultaneously, Pearson correlation analysis was used to analyze the correlations of the features from the fused dataset (integrating data from the three devices) as well as those from the DT data with maize leaf dry biomass. [Results and Discussions] The results demonstrated that, among the dry biomass prediction models developed in this study, the model based on Laser point cloud data and the FCNN method achieved the highest accuracy, with a mean absolute error (MAE) of 0.08 g, a mean absolute percentage error (MAPE) of 4.60%, a root mean square error (RMSE) of 0.10 g, and a coefficient of determination (R²) of 0.98. In the correlation analysis, the leaf area exhibited the strongest correlation with dry biomass (r = 0.92), followed by the number of points (r = 0.88), leaf width (r = 0.86), and leaf length (r = 0.77). In the feature importance ranking, the leaf area trait consistently ranked within the top two positions, whereas the number of points ranked among the top three in most cases. However, features such as the height of the leaf base above the ground, the horizontal distances from the leaf tip and apex to the stem, and the azimuth angle demonstrated low correlations with dry biomass and low feature importance. [Conclusions] Among all the maize leaf features investigated in this study, size-related traits (such as leaf area, point count, leaf length, and leaf width) had the greatest impact on the accuracy of dry biomass estimation. The utilization of high-resolution 3D point clouds of maize leaves, combined with machine learning methods, enabled a high-accuracy estimation of leaf dry weight and provided a novel approach for the non-destructive measurement of dry biomass in crop organs.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Object Detection Method of Maize Ears Within Canopy Based on CornYOLO

GAO Guangfu, WANG Qilei, SONG Liwen, FENG Haikuan, SHI Lei, YANG Hao, LIU Yang, YUE Jibo

Smart Agriculture 2026, 8 (1): 167-177. DOI: 10.12133/j.smartag.SA202509005

Abstract （1726）

HTML （23）

PDF（pc）（2701KB）（57）

Save

[Objective] As a major grain crop, maize plays a critical role in global food security. The ears of maize serves as a key phenotypic trait, providing essential information on the plant's physiological and agronomic status. Its morphological characteristics, size, and color effectively reflect the plant's growth status and potential yield. Therefore, accurately acquiring images of maize ears in the field across different growth stages is crucial for breeding research and yield prediction. Traditional field detection of maize ears relies heavily on manual labor, which is not only inefficient and labor-intensive but also struggles to meet the high-throughput demands of modern precision breeding programs. There is an urgent need for efficient, automated detection technologies that can operate reliably under real-world field conditions. To address the requirement for efficient acquisition of maize ears phenotypic traits in field breeding work, the objective of this research is to develop a robust object detection solution suitable for large-scale field environments. An improved CornYOLO model based on the YOLO11n (You Only Look Once) architecture was designed to enhance the detection accuracy and efficiency of maize ears in complex field environments. [Methods] Image data were acquired using an unmanned ground vehicle (UGV) equipped with a high-resolution panoramic camera, which traversed multiple experimental plots under varying lighting and growth conditions. A dataset containing 1 152 annotated samples was constructed, covering diverse ear morphologies and occlusion scenarios. Dynamic data augmentation techniques were applied during training to enhance the model's generalization capability. Three key enhancements were introduced to the YOLO11n detection framework. First, a cross stage partial network with dynamic pointwise spatial attention (C2PDA) module was designed to replace the cross stage partial with pointwise spatial attention (C2PSA) module in the YOLO11 backbone network. This module enhanced spatial discriminability and channel sensitivity in feature representation through the collaborative integration of a dynamic channel weighting mechanism and position-aware modeling. It significantly improves the model's performance in identifying maize ears under challenging field conditions such as occlusion of stems and leaves and multi-scale target distribution. Second, the spatial pyramid pooling-fast (SPPF) module in the original model was replaced with an feature refinement module (FRM ) to optimize multi-scale feature fusion. The FRM functions via directional feature decomposition and an adaptive attention mechanism. It captures fine-grained spatial structural information through horizontal and vertical bidirectional pooling and combines spatial-channel cooperative attention for dynamic feature calibration, thereby improving recognition accuracy across varying ear sizes and complex backgrounds. Finally, the unified intersection over union (UIoU) loss function was introduced to optimize bounding box regression accuracy. UIoU is an innovative loss function that emphasizes weight allocation among prediction boxes of different qualities. It adaptively adjusted the weight of each prediction box's loss term based on the IoU value or its monotonic function, assigning higher weights to lower-quality predictions to prioritize their optimization, while reducing weights for high-quality boxes to prevent over-optimization. [Results and Discussions] Experimental results demonstrate that CornYOLO achieved a mAP@50 of 89.3% on the validation set, with the F₁-Score increasing by 2.5 percentage points. Compared to widely used lightweight models including YOLOv3-tiny, YOLOv5n, YOLOv8n, YOLOv10n, real-time detection transformer (RT-DETR) and YOLO13n, CornYOLO showed significantly superior detection performance in complex field environments, with mAP@50 improvements of 2.2, 1.9, 1.8, 5.7, 12.6 and 2.4 percentage points, respectively. These results fully validate that CornYOLO can efficiently and accurately extract maize ear images under field conditions, providing a technical foundation for precise phenotypic evaluation and yield prediction. Furthermore, ablation studies were conducted: Introducing the C2PDA module improved the model's mAP@50 by 0.5 percentage points and the F₁-Score by 0.5 percentage points. However, after incorporating the FRM module, which successfully enhanced multi-scale detection performance and increased the F₁-Score by 1.5 percentage points, the integration of these two modules resulted in the generation of a small number of low-quality detection boxes. The original loss function was inefficient in optimizing such boxes, resulting in no improvement in mAP@50 after the modification. To address this issue, the UIoU loss function was introduced. By dynamically adjusting weight assignments based on prediction quality, it significantly improved the regression performance for low-quality detection boxes, thereby enhancing the localization accuracy and convergence stability of the model in dense target scenarios. The final CornYOLO model exhibited excellent overall performance: Compared to the original YOLO11n, the F₁-Score increased by 2.5 percentage points and mAP@50 improved by 1.1 percentage points. The experimental results fully demonstrate that CornYOLO effectively enhances the detection capability for maize ears in complex field environments compared to the baseline YOLO11n model. [Conclusions] The CornYOLO model proposed in this study incorporates three key components: C2PDA, FRM, and UIoU, which enhances model convergence and localization performance in dense and occluded scenes, enables the model to effectively and precisely identify maize ears under practical conditions, thereby providing reliable technical support for phenotypic analysis and yield prediction in maize breeding. Future work will focus on extending the model to other crop types and further optimizing inference efficiency for real-time deployment on mobile platforms.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Intelligent Inspection Path Planning Algorithm for Large-Scale Cattle Farms

CHEN Ruotong, LIU Jifang, ZHANG Zhiyong, MA Nan, WEI Peigang, WANG Yi, YANG Yantao

Smart Agriculture 2026, 8 (1): 178-191. DOI: 10.12133/j.smartag.SA202504004

Abstract （1757）

HTML （17）

PDF（pc）（4660KB）（92）

Save

[Objective] Timely detection and early warning of livestock health issues are critical for green and efficient management within large-scale cattle farms. Traditional manual inspections are time-consuming, labor-intensive, and prone to missed or erroneous detections. Robotic inspections offer significant advantages including all-weather operation, high precision, high efficiency, and low cost. However, existing path planning approaches predominantly focus on dynamic obstacle avoidance and fixed target point inspection path, often failing to address two key challenges in dynamic large-scale farm environments: global traversal of individual large livestock (e.g., beef cattle, dairy cows) and accessibility of local areas compromised by dynamic obstacles. This study aims to overcome the limitations of existing robotic inspection systems in large-scale cattle farms, specifically addressing the lack of comprehensive inspection capability for dynamic individuals, excessive path redundancy, and insufficient proactive obstacle avoidance capability. [Methods] A global-local optimization algorithm was proposed for large-scale cattle farm intelligent inspection path planning, which integrated the traveling salesman problem (TSP), A* and dynamic window approach (DWA), and solved the problems of global multi-objective individual traversal, path redundancy and local passability with proactive obstacle avoidance in dynamic cattle farm scenarios. For global traversal optimization, a global path planning algorithm was introduced which combined improved TSP and optimized A*. Specifically, the inspection status list tracking breeding sheds and individual cattle was maintained to enhance the TSP's Nearest Neighbor Algorithm, dynamically updating targets to avoid re-visits. A dynamic priority mechanism optimized multi-objective inspection, determining the optimal visitation sequence across barns and dynamic paths within barns. The data structure of the A* algorithm was optimized, a diagonal distance heuristic function was introduced to replace Manhattan distance, which more accurately reflected the movement cost in eight directions. The path obtained by the A* algorithm through greedy strategy was simplified, and Bresenham's line algorithm was used to check whether there were obstacles in the straight line field of view. If there were no obstacles, redundant inflection points were removed to construct an efficient moving path between sheds. For local passability optimization, an enhanced DWA-based local path was proposed for planning algorithm. The dynamic safety threshold of obstacle size was introduced to improve the DWA. When the inspection robot judged that the size of the obstacle in the local accessible area was too large and the robot was difficult to pass, it would actively avoid or detour in advance to ensure the safe avoidance of large obstacles in narrow passages. The improved DWA also increased the task progress potential field, drived the robot to move to the breeding shed to be visited with the attractive force field model, balanced the local obstacle avoidance and global inspection efficiency, and realized the real-time judgment of local area passability caused by dynamic obstacles and proactive obstacle avoidance in advance. [Results and Discussions] The optimized A* algorithm's data structures significantly improved search efficiency. The diagonal distance heuristic and greedy strategy substantially enhanced path smoothness. Compared to the traditional A*, the improved A* achieved average reductions of 90.06% in planning time, 85.13% in path turns, and 1.83% in path length. The global inspection algorithm combining improved TSP and optimized A* achieved 100% average coverage of individual cattle. Inspection path length and time were reduced by 17.99% and 20.85%, respectively, compared to the classic ant colony optimization (ACO) algorithm, demonstrating superior efficiency in dynamic multi-objective inspection scenarios. The improved DWA successfully enabled proactive judgment of local path passability based on obstacle size. By adjusting the robot's linear velocity, angular velocity, and attitude angle in real time, the algorithm achieved robust proactive obstacle avoidance. The inspection robot would reduce the linear velocity in advance when encountering obstacles, and realize proactive obstacle avoidance by adjusting the attitude angle. Simulation experiments confirmed that robots equipped with the improved DWA effectively navigated around unknown static and dynamic obstacles while maintaining global path-tracking capability. [Conclusions] The global inspection algorithm combining improved TSP and optimized A*, utilizing dynamic inspection status lists and path optimization techniques, achieved global inspection coverage of individual cattle and could significantly improve inspection quality and efficiency. The local inspection algorithm based on improved DWA, incorporating obstacle size dynamic safety threshold and task progress, achieved real-time judgment of local passability and proactive obstacle avoidance, ensuring safe robot navigation in complex environments. The global-local co-optimization framework demonstrated adaptability to the dynamic farm environment, enabling the timely completion of individual traversal tasks, and providing a robust solution for intelligent inspection in large-scale cattle operations. Future work involves integrating the proposed path planning algorithm with simultaneous localization and mapping (SLAM), cattle identification, distance detection systems on inspection robot platforms, and conducting extensive field tests within operational cattle farms. Exploring multi-robot collaborative inspection frameworks and incorporating the Vision-and-Language Navigation model to enhance environmental perception and anomaly-handling capabilities are promising directions for adapting to the complexities of even larger-scale farming scenarios.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Underwater Insitu Weight Estimation Method for Chinese Mitten Crab Based on Binocular Vision and Improved YOLOv11-pose

LI Aoqiang, DAI Hangyu, GUO Ya

Smart Agriculture 2026, 8 (1): 192-202. DOI: 10.12133/j.smartag.SA202505019

Abstract （1630）

HTML （50）

PDF（pc）（4043KB）（82）

Save

[Objective] With the accelerated development of large-scale and intelligent aquaculture, accurate estimation of the body weight of individual Chinese mitten crabs is critical for tasks such as precise feeding, disease prevention, and optimization of harvest decisions. Traditional methods of manually catching and weighing crabs are time-consuming, labor-intensive, and can cause stress or injury to the crabs, while also failing to provide real-time monitoring. To address the challenges posed by turbid water conditions in aquaculture, which lead to poor image quality and difficulty in feature extraction, a method is proposed for estimating Chinese mitten crab weight that combines binocular vision with deep learning–based keypoint detection. This approach achieves high-precision detection of anatomical keypoints on the crab, providing new technical support for precision aquaculture and intelligent management. [Methods] Based on a lightweight YOLOv11 framework, in its C3K2 module, MBConv depthwise-separable convolutions were incorporated to significantly reduce computational complexity and improve feature extraction efficiency. An EffectiveSE channel attention mechanism was introduced to adaptively emphasize important channel-wise features. To further enhance cross-scale information fusion, a spatial dynamic feature fusion module (SDFM) was added. The SDFM adaptively and weightedly fused local spatial attention with global channel attention, enabling detailed extraction of crab shell edges and anatomical keypoints. The improved YOLOv11-ES model could simultaneously output the crab's bounding box, the positions of four anatomical keypoints, and the crab's sex classification in a single forward pass. In the 3D reconstruction stage, calibrated stereo camera parameters were used, and a sparse keypoint matching strategy guided by the crab's sex and spatial geometric constraints was employed. High-confidence keypoint pairs were selected from the left and right views, and the true 3D coordinates of the crab's carapace length and width were computed by triangulation. Finally, the obtained carapace length, width, and sex label data were fed into a two-layer back-propagation (BP) neural network to perform a regression prediction of the individual crab's weight. [Results and Discussion] To validate the effectiveness and robustness of the proposed method, a dataset of Chinese mitten crab images with annotated keypoints was constructed under varying water turbidity and lighting conditions, and both ablation and comparative experiments were conducted. The YOLOv11-ES achieved a mean average precision at intersection over union (IOU) threshold of 0.5 (mAP@50) of 97.2% on the test set, which was 4.4 percentage points higher than the original YOLOv11 model. The keypoint detection component reached an mAP@50 of 96.7%, which was 3.6 percentage points higher than that of the original YOLOv11 model. In comparative experiments, YOLOv11-ES also demonstrated significant advantages over other models in the same series. Moreover, in a full-system evaluation using images of 30 individual crabs, the mean absolute percentage error (MAPE) for carapace width measurements was only 2.68%, and for carapace length it was 1.48%. The Pearson correlation coefficients between the measured and manually obtained true values for both carapace length and width exceeded 0.977, indicating high accuracy in the 3D reconstruction and minimal measurement error. Experiments analyzing the influence of image quality on measurement accuracy showed that when the underwater image quality measure (UIQM) reached at least 1.5, the combined MAPE of carapace length and width errors could be kept below 5%. When UIQM reached at least 2.2, the MAPE dropped to about 1.9%. These results confirmed the robustness of the method against variations in water turbidity and lighting conditions. For weight regression prediction, the BP network trained on carapace length, width, and sex features achieved a mean absolute error (MAE) of 2.39 g and a MAPE of 7.1% on an independent test set, demonstrating high-precision estimation of individual crab weight. [Conclusions] The proposed method, which combines an improved YOLOv11 object detection network, binocular sparse keypoint matching, and a two-layer BP regression network, enabled high-precision, low-error, real-time, non-contact estimation of Chinese mitten crab weight in complex turbid aquatic environments. This approach featured a lightweight model, high computational efficiency, excellent measurement accuracy, and strong adaptability to varying environmental conditions. It provided key technical parameters for intelligent Chinese mitten crab farming. In the future, this approach could be extended to other aquaculture species and complex farming scenarios. Combined with transfer learning and online adaptive calibration techniques, its generalization capability could be further improved and integrated with intelligent monitoring platforms to achieve large-scale, all-weather underwater crab weight estimation, contributing to the sustainable development of smart aquaculture.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Online Detection System for Freshness of Fruits and Vegetables Based on Temporal Multi-source Information Fusion

HUANG Xianguo, ZHU Qibing, HUANG Min

Smart Agriculture 2026, 8 (1): 203-212. DOI: 10.12133/j.smartag.SA202505037

Abstract （885）

HTML （45）

PDF（pc）（2248KB）（54）

Save

[Objective] Real-time and accurate quality monitoring of fruits and vegetables during cold chain logistics is of great importance for ensuring supply chain quality and reducing economic losses. However, traditional detection methods generally suffer from several core deficiencies, such as being offline, relying on unimodal information, and being unable to capture dynamic evolution. To overcome these challenges, an online freshness detection system is proposed and implemented for fruits and vegetables based on temporal multi-source information fusion. The system was designed to achieve precise online detection of fruit and vegetable freshness, providing an effective technical solution for the refined management and early spoilage warning within the cold chain supply chain, thereby significantly reducing economic losses. [Methods] A complete system was constructed, consisting of a lower-computer data acquisition node, an IoT cloud platform, and an upper-computer Qt client. The lower-computer synchronously collected environmental temporal sensing data (temperature, humidity, CO₂, ethylene) and visual temporal images of indicator tags via a self-designed portable acquisition node. A novel co-attention-based convolutional recurrent network (Co-ACRN) deep learning model was proposed for deeply mining the complex correlations between the two heterogeneous time-series data streams. This model innovatively employed a "co-attention + self-attention" dual mechanism. Firstly, in the early fusion stage, a co-attention module intelligently aligned and deeply integrated visual and sensor feature sequences by constructing a cross-modal affinity matrix. Subsequently, the fused sequence was fed into a long short-term memory (LSTM) network to encode temporal cumulative effects. Finally, a self-attention module performed a global contextual review on the LSTM output to capture long-range temporal dependencies. In the specific implementation, visual features were extracted by a lightweight convolutional neural network (CNN) with two convolutional-pooling layers; the co-attention calculated weights by generating context-aware intermediate features; and the self-attention adopted the standard scaled dot-product attention mechanism. For application deployment, the model was efficiently deployed to the Qt client in the open neural network exchange (ONNX) format, achieving real-time, edge-side inference. [Results and Discussions] Experimental results showed that the proposed Co-ACRN model achieved an overall accuracy of 96.93% on the test set in the three-class mango freshness detection task, with its performance significantly surpassing that of various mainstream baselines and advanced temporal multimodal fusion models, such as modality-invariant and specific-representations for multimodal sentiment analysis (MISA), recurrent attended variation embedding network (RAVEN), multimodal transformer (MulT), and heterogeneous hierarchical message passing network (HHMPN). To verify the rationale of the model design, two sets of ablation experiments were conducted. The input-based ablation study decisively proved that the combination of "time-series information + multimodal information" is a necessary prerequisite for accurate detection, as any model relying on unimodal or static information exhibited significant performance bottlenecks. The architecture-based ablation study further confirmed the superiority of the proposed "dual-attention" system; compared to a backbone network without any attention mechanism, its accuracy was improved by more than five percentage points, and the recall rate for the critical "spoiled" category was as high as 99.16%. An in-depth analysis of the confusion matrix revealed that the vast majority of the model's errors occurred between adjacent categories with the most similar physical states, with no serious cross-category misclassifications, demonstrating its strong robustness. After being deployed on the client side, the system's single diagnosis time was less than 2 s, verifying the solution's combination of high accuracy and real-time performance. [Conclusions] The developed online detection system and Co-ACRN model successfully enabled the real-time, accurate, and non-destructive intelligent detection of fruit and vegetable freshness. The research findings indicate that by combining advanced co-attention and self-attention mechanisms, the fusion challenges of complex multimodal temporal data can be effectively solved. In summary, this study provides a complete solution that combines theoretical innovation with engineering practicality for the online and intelligent detection of distributed fruit and vegetable freshness, and paves new paths for the development of this field in both theory and practice.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Obstacle Avoidance Control Method of Electric Skid-Steering Chassis Based on Fuzzy Logic Control

LI Lei, SHE Xiaoming, TANG Xinglong, ZHANG Tao, DONG Jiwei, GU Yuchuan, ZHOU Xiaohui, FENG Wei, YANG Qinghui

Smart Agriculture 2026, 8 (1): 213-225. DOI: 10.12133/j.smartag.SA202408003

Abstract （668）

HTML （11）

PDF（pc）（2316KB）（782）

Save

[Objective] Trajectory tracking and obstacle avoidance control are important components of autonomous driving chassis, but most current studies treat these two issues as two independent tasks, which will cause the chassis to stop trajectory tracking when facing an obstacle, and then implement trajectory tracking again after completing obstacle avoidance. If the distance from the reference path after obstacle avoidance is too far, the subsequent tracking performance will be affected. There are also some studies on trajectory tracking and obstacle avoidance at the same time, but these studies are either not smooth enough and prone to chatter, or the control system is too complex. Therefore, a simple algorithm is proposed that can simultaneously implement trajectory tracking and obstacle avoidance control of the chassis in this research. [Methods] First, the kinematic model and kinematic error model of the chassis were designed. Since skid-steering was adopted, the kinematic model of the chassis was simplified to a two-wheel differential rotation robot model when designing the mathematical model. Secondly, the Takagi-Sugeno (T-S) fuzzy controller of the chassis was designed. Since the error model of the chassis was designed in advance, the T-S fuzzy model of the chassis could be designed. Based on the T-S model, a T-S fuzzy controller was designed using the parallel distributed compensation (PDC) algorithm. The linear quadratic regulator (LQR) controller was used as the state feedback controller of each fuzzy subsystem in the T-S fuzzy controller to form a global T-S fuzzy controller, which could realize the trajectory tracking function of the chassis when there were no obstacles. Secondly, the obstacle avoidance controller of the chassis was designed. A new $L Q R o b s$ controller was designed in the global open-loop system to generate the reference trajectory to avoid obstacles. When the system detects an obstacle in the environment, the $L Q R o b s$ controller starts working, and generates a new path by judging the distance between the obstacle and the chassis, so that the chassis could avoid the obstacle. When the chassis bypassed the obstacle, the $L Q R o b s$ controller stopped working. In order to better realize the obstacle avoidance function, a fuzzy controller was designed to adjust the gain matrices Q and R of the $L Q R o b s$ controller in real time. Then, in order to realize trajectory tracking and obstacle avoidance controlled at the same time, a fuzzy fusion controller was designed to combine the two controllers to form the final chassis input, and the Mamdani fuzzy controller was selected to achieve it. Finally, the method was simulated and experimental tested. The simulation test used joint simulation test used MATLAB-Simulink and the experiments based on the self-developed electric multi-functional chassis were conducted. [Results and Discussions] The simulation results showed that when there were no obstacles, the control method could achieve stable trajectory tracking in the reference path composed of straight lines and curves. When there were obstacles, the vehicle could avoid them smoothly and quickly converge to the reference trajectory. When facing obstacles, the designed fuzzy logic $L Q R o b s$ controller could adaptively change the controller gain matrix according to the vehicle's speed and the distance between the current obstacles to achieve rapid convergence. The experimental results showed that when there were no obstacles, the chassis could use the T-S fuzzy controller to achieve stable tracking of the reference trajectory, and the average errors in the lateral and longitudinal directions of the entire tracking process were 0.041 and 0.052 m, respectively. When facing obstacles, the T-S fuzzy controller and the $L Q R o b s$ controller realized the obstacle avoidance and tracking control of the chassis through joint control. The fuzzy controller was used to adjust the gain matrix of the $L Q R o b s$ controller in real time, and the tracking error was reduced by 33.9% compared with the controller with a fixed gain matrix. [Conclusions] The control system can simultaneously realize the trajectory tracking and obstacle avoidance control of the chassis, can quickly converge the tracking error to zero, and achieve smooth obstacle avoidance control. Although the control method proposed is simple and efficient, and the tracking and obstacle avoidance effects are significantly improved, the control method can only handle static obstacles on the reference path at present, and subsequent research will focus on dynamic obstacles.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Multi-Machine Collaborative Operation Scheduling and Planning Method Based on Improved Genetic Algorithm

ZHU Tianwen, WANG Xu, ZHANG Bo, DU Xintong, WU Chundu

Smart Agriculture 2026, 8 (1): 226-236. DOI: 10.12133/j.smartag.SA202508010

Abstract （444）

HTML （16）

PDF（pc）（3016KB）（40）

Save

[Objective] Traditional harvesting processes in large-scale farms still suffer from low scheduling efficiency, uneven workload distribution, and suboptimal path planning, which hinder the realization of intelligent and efficient agricultural production. Multi-machine collaborative operation scheduling and planning has become key technologies in intelligent farming management, aiming to optimize task allocation and path planning among multiple harvesters under time window and workload balance constraints. However, such problems belong to complex combinatorial optimization categories characterized by high dimensionality and nonlinearity. Conventional genetic algorithms (GA) often exhibit premature convergence and weak local search capabilities, resulting in suboptimal scheduling schemes. To address these challenges, this study focused on the collaborative harvesting operations of multiple combine harvesters across several fields and proposed an improved multi-traveling salesman problem genetic algorithm (IMTSP_GA) for integrated multi-machine scheduling and path planning. [Methods] A multi-machine cooperative scheduling model was constructed with the objective of minimizing the total operational time of all harvesters while considering time window and load-balancing constraints. The problem was modeled as a multi-traveling salesman problem (MTSP), in which each harvester was regarded as a traveling salesman responsible for a subset of field tasks. To solve the model, the proposed IMTSP_GA adopted a two-layer chromosome encoding structure: The first layer represented the visiting sequence of all task units, and the second layer defined the segmentation positions that allocated tasks to different machines, thereby forming feasible multi-harvester operation routes. To ensure both initial solution quality and population diversity, a hybrid initialization strategy combining sequential and random initialization was designed. Furthermore, a Q-learning-based adaptive mutation mechanism was introduced into the genetic operation process. By constructing a state–action–reward model based on the variation trend of fitness values, the algorithm dynamically selected mutation operators according to their historical performance, thus balancing global exploration and local exploitation. The overall process included chromosome encoding, fitness evaluation, group-based selection, crossover and mutation operations, and Q-learning-driven adaptive control. Based on the optimized scheduling scheme, the full-path planning for each harvester was divided into two stages: (1) in-field path planning, which used an internal spiral coverage method to reduce turning frequency and non-working time; and (2) road network path planning, which employed the Dijkstra algorithm to obtain globally shortest travel routes between fields. [Results and Discussions] A total of 25 farmlands were divided into 49 task units, and four John Deere 3588 harvesters were used for the simulation. Comparative experiments were performed among IMTSP_GA, standard GA, particle swarm optimization (PSO), and ant colony optimization (ACO). The results showed that the IMTSP_GA significantly outperformed other algorithms in terms of total operation time, convergence speed, and computational efficiency. Specifically, the total operational time was reduced by 4.48%, 5.32%, and 9.87% compared with GA, PSO, and ACO, respectively. The average runtime was 5.82 s, which was substantially shorter than that of the GA (11.55 s) and PSO (10.70 s). The algorithm exhibited fast early convergence and effectively avoided premature stagnation. To further evaluate generalization capability, five classical traveling salesman problem (TSP) datasets, Berlin52, Eil76, Bier127, CH150, and KroB200, were tested. IMTSP_GA consistently achieved superior average solutions and shorter runtimes across all datasets, confirming its robustness and adaptability to different problem scales and complexities. Finally, full-process path planning was visualized based on the optimized scheduling results. The generated harvester routes were continuous and compact, ensuring reasonable task allocation and efficient transitions between fields, thereby validating the effectiveness of the proposed model. [Conclusions] By integrating a Q-learning-based adaptive mutation mechanism, IMTSP_GA autonomously selects effective mutation strategies to enhance search performance and convergence stability. Meanwhile, the hybrid initialization strategy maintains population diversity and improves the quality of initial solutions. IMTSP_GA surpasses traditional GA, PSO, and ACO in solution quality, convergence performance, and computational efficiency. The method effectively reduces total operation time, optimizes harvester task allocation, and improves the coordination and efficiency of multi-machine operations. In future work, the research will be extended to more complex scenarios involving multi-region cooperation, task prioritization, and dynamic environmental factors. Reinforcement learning and online optimization techniques will be incorporated to achieve real-time scheduling and intelligent decision-making, thereby enhancing the adaptability and engineering applicability of the proposed method in large-scale intelligent agricultural systems.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Earth Observation-Driven Digital (Smart) Agriculture: Research Frontiers and Application Cases

WU Bingfang, MA Hui, ZHANG Miao, PAN Qingcheng, ZHANG Xiang, CHEN Shuisen, QIU Bingwen, XU Xingang, LIU Jianhong, FAN Jinlong, HUANG Jianxi, JIANG Jiale, HE Changchui

Smart Agriculture 2026, 8 (2): 1-17. DOI: 10.12133/j.smartag.SA202512027

Abstract （1174）

HTML （53）

PDF（pc）（2369KB）（166）

Save

[Significance] Digital agriculture is unequivocally the core driving force for modern agricultural transformation, fundamentally aiming to achieve full-process digital mapping and intelligent management of production through the deep integration of advanced information technologies such as the Internet of Things, big data, artificial intelligence (AI), and remote sensing, with earth observation (EO) technology serving as the essential data engine providing indispensable spatial information support for this systemic shift. However, the current landscape of digital agriculture development remains unbalanced, exhibiting a tendency to be "heavy on transactions and light on production", where the core production links suffer from low digitalization penetration rates; furthermore, the profound knowledge embedded within the vast corpus of EO data has yet to be fully extracted and interpreted, leading to a situation where many established algorithms demonstrate insufficient robustness and universality when confronted with the complexity and diversity of global cropping systems, thereby limiting their practical efficacy. Crucially, an over-reliance on technology to optimize production efficiency alone, without ecological guidance, can induce secondary environmental risks, such as exacerbating regional groundwater depletion or contributing to a decline in biodiversity through agricultural landscape simplification, thus necessitating an approach that promotes the deep coupling of EO technology with agronomic principles and local ecological practices to construct a resilient smart agricultural system that achieves a holistic balance between productivity, resource efficiency, and ecological integrity. [Progress] The current research frontiers of EO-driven digital agriculture primarily converge on three critical domains: intelligent crop condition monitoring, digital twin farming systems, and the enhancement of agricultural system resilience. Intelligent monitoring utilizes the fusion of high-resolution remote sensing imagery and machine learning frameworks to enable large-scale, comprehensive crop mapping and the fine-grained identification of crop types at the field scale, with next-generation yield prediction models integrating advanced deep learning techniques to significantly improve accuracy, while remote sensing is also effectively employed for agricultural disaster monitoring. The digital twin farming system represents an advanced stage of precision agriculture, centered on digitally modeling all agricultural production elements to construct a highly consistent virtual replica of the physical environment, operating through a real-time closed-loop mechanism of perception, simulation and analysis, and decision-making support to guide optimal interventions; successful applications include intelligent water resource scheduling in Chinese irrigation districts and the use of AI vision algorithms to manage complex biological processes like crab farming, although the field must overcome the issue of "pseudo-twins" that focuses on mere visualization rather than driving concrete operational decisions. The focus on agricultural system resilience is supported by digital agriculture providing crucial spatial data on global crop yields, cultivated land distribution, and practices like terracing. To illustrate the practical efficacy of these technologies, this paper analyzes two representative application cases. First, the CropWatch system represents a paradigm shift in agricultural monitoring by constructing a "Cloud-Edge" collaborative ecosystem. It integrates machine learning with a "Pre-training, Prompting, and Fine-tuning" large language model (LLM) framework to automate remote sensing-based crop monitoring, report generation and enhance decision-support intelligence. Through open application programming interfaces (APIs) and multi-scale capabilities, CropWatch provides cross-scale information and decision support from macro-level policy support to micro-level farm management, serving as a global public good that bridges the digital divide in developing nations. Second, in the domain of agricultural water management, the ETWatch technical system demonstrates a robust solution for the precise governance of water resources. By achieving high-resolution evapotranspiration (ET) monitoring from basin to field scales, it enables the accurate assessment of water productivity and the optimization of irrigation schedules. Crucially, this technology is successfully embedded into institutional mechanisms, such as water rights allocation and tiered pricing based on actual consumption, thereby realizing a transformation from empirical water use to data-driven, precise regulation. [Conclusions and Prospects] In sum, digital (smart) agriculture is rapidly transcending its role as a mere extension of agricultural informatization to become the "new-quality productivity" driving high-quality agricultural development, achieving this by fundamentally restructuring production factors, enhancing resource efficiency, strengthening risk response capabilities, and promoting value chain upgrading, thereby offering critical momentum for constructing a more efficient, greener, and sustainable modern agricultural system. Given China's pronounced global advantages in the digital economy, information technology, remote sensing, and intelligent equipment, the nation is well-positioned to integrate these strengths to construct comprehensive, full-chain smart agricultural solutions whose mature systemic models and business paradigms can ultimately form a "China Card" in the global agricultural revolution, contributing Chinese wisdom and solutions towards the realization of global food security and the zero-hunger goal.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

High Spatiotemporal Resolution Remote Sensing for Precision Agricultural Disaster Early Warning: Progress, Bottlenecks, and Integrative Pathways

XU Xiaobin, ZHU Hongchun, LI Feng, HE Wei, YANG Jiaming, LI Zhenhai

Smart Agriculture 2026, 8 (2): 18-34. DOI: 10.12133/j.smartag.SA202512002

Abstract （841）

HTML （29）

PDF（pc）（1228KB）（57）

Save

[Significance] Under climate change, the frequency and intensity of extreme weather events have increased markedly, posing persistent threats to global food security. Agricultural meteorological disasters, including droughts, floods, heat stress, frost damage, and mechanically induced events such as lodging and hail, are increasingly characterized by rapid onset, strong spatial heterogeneity, and compound interactions. Conventional management strategies relying mainly on post-event assessment are insufficient for timely warning and precision intervention. The development of high spatiotemporal resolution remote sensing and integrated observation systems combining satellite, unmanned aerial vehicle (UAV), and ground-based sensing has substantially advanced agricultural disaster monitoring. These technologies enable field-scale characterization of spatial variability and detection of short-duration disaster processes at hourly to daily timescales. This review synthesizes recent progress in sky-air-ground integrated remote sensing for agricultural meteorological disaster management and establishes a unified framework linking monitoring, early warning, and decision-making, with emphasis on hydrological stress, thermal stress, and structural damage. [Progress] At the observation level, a multi-tier sensing architecture has emerged. Satellite remote sensing provides broad coverage and regular revisit cycles, forming the backbone of regional monitoring. Optical sensors support retrieval of crop structural and biochemical parameters, thermal infrared data enable canopy temperature and evapotranspiration estimation, and synthetic aperture radar (SAR) offers all-weather capability for soil moisture and flood detection. Solar-induced chlorophyll fluorescence (SIF) provides direct information on crop photosynthetic function and enables early identification of physiological stress. UAV platforms complement satellites through flexible deployment and centimeter-scale resolution, allowing detailed mapping of canopy temperature and three-dimensional crop structure using multispectral, thermal, and light detection and ranging (LiDAR) sensors. Ground-based meteorological stations and sensor networks provide continuous measurements for calibration and validation, although scaling point observations to spatially continuous products remains challenging. Consequently, multi-sensor integration is evolving from data stacking toward physically complementary constraint frameworks. Methodologically, two dominant approaches of physically based inversion and data-driven recognition are used. Radiative transfer models, surface energy balance methods, and SAR scattering models offer strong physical interpretability but depend on prior information and data quality. Machine learning and deep learning methods effectively capture nonlinear relationships and complex spatial patterns for disaster identification, yet remain limited by interpretability and cross-regional generalization. At the early-warning stage, crop growth models, hydrological models, and spatiotemporal prediction networks are applied to simulate disaster evolution. Hybrid models embedding physical constraints into data-driven frameworks have become a key research direction to enhance predictive robustness. Decision-support systems have expanded from threshold-based rule engines toward optimization algorithms and multi-objective frameworks, enabling warning information to be translated into actionable irrigation scheduling, protective measures, and emergency responses. Regarding specific hazards, drought monitoring has shifted from vegetation indices toward coupling root-zone soil moisture with crop physiological responses, with SIF-based indicators showing strong potential for early stress detection. Flood studies rely primarily on SAR-based inundation mapping and extend toward quantitative damage assessment. Heat and frost stress research emphasizes growth-stage-dependent dynamic thresholds. Lodging monitoring integrates structural parameters derived from optical, LiDAR, and SAR data, while hail-related studies focus on rapid post-event damage mapping. Compound and cascading disasters have become an important research frontier. [Conclusions and Prospects] High spatiotemporal resolution remote sensing has greatly enhanced the observability and early-warning potential of agricultural meteorological disasters. Nevertheless, key challenges remain, including heterogeneous data integration, scale inconsistency, uncertainty propagation, and insufficient coupling among monitoring, warning, and decision-making components. Future progress requires a systems-engineering perspective. Physically guided machine learning can bridge mechanistic understanding and data adaptability, while agricultural disaster digital twins provide a framework for dynamic interaction among observation, simulation, and decision optimization. In parallel, multi-factor time-series risk modeling and multi-agent learning are needed to better represent compound disaster processes and support intelligent, adaptive, and precision-oriented agricultural disaster management systems.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Lodging Region Detection Method in Flax Based on Lightweight Improved YOLOv11n-seg Model

SU Yujie, LI Yue, WEI Linjing, WU Bing, GUO Linhai, YAN Bin, ZHOU Hui, GAO Yuhong, KANG Lianghe, LIU Huan, SU Shunchang

Smart Agriculture 2026, 8 (2): 35-47. DOI: 10.12133/j.smartag.SA202508013

Abstract （1194）

HTML （34）

PDF（pc）（5158KB）（61）

Save

[Objective] Lodging is a major agronomic constraint that adversely affects both yield and quality in field crops, with flax (Linum usitatissimum L.) being especially vulnerable due to its slender stems and susceptibility to wind and rainfall. Precise delineation of lodged areas from field imagery remains a significant challenge owing to the complex and heterogeneous morphology of lodging patterns, irregular and blurred boundaries, and substantial background interference from upright plants, weeds, and soil textures. These factors necessitate the development of a segmentation framework that combines high precision and strong boundary adherence with computational efficiency, enabling deployment on resource-constrained agricultural monitoring platforms. In response to this need, a lightweight accurate lodging segmentation approach based on improved YOLOv11n-seg architecture was proposed to enhance fine-grained feature sensitivity, multi-scale representation capability, and boundary precision, while markedly reducing parameter count, giga floating-point operations (GFLOPs), and model size. [Methods] The proposed architecture integrated targeted modifications across the backbone, neck, and output stages. In the backbone, standard C3k2 modules were replaced with C3k2_SDW blocks, which combined a StarBlock structure with depthwise separable convolutions to reduce redundancy and computation without sacrificing spatial and contextual representational capacity. To counteract potential reductions in channel discrimination resulting from light-weighting, a multi-scale efficient channel attention (MS-ECA) mechanism was embedded within selected backbone layers, yielding C3k2_SDW_MS-ECA modules. These modules incorporated parallel convolution branches with varying kernel sizes to capture channel-wise dependencies across multiple receptive fields, thereby adaptively recalibrating lodging-related features with minimal computational overhead. In the neck, a bidirectional feature pyramid network (BiFPN) was introduced to facilitate efficient bidirectional information exchange between scales. By assigning normalized, trainable fusion weights, the BiFPN adaptively balanced contributions from low- and high-level feature maps, while a multi-stage semantic fusion strategy further enriched the integration of spatial details and contextual semantics, thereby improving the detection of small and fragmented lodged patches. At the output stage, a boundary refinement procedure was applied to the predicted masks, improving contour sharpness, enhancing boundary compactness, and mitigating false detections in complex visual environments.The experimental dataset comprised unmanned aerial vehicle (UAV) RGB imagery at a resolution of 4 032×2 268 pixels, acquired from flax fields in Dingxi, Gansu province. Lodged regions were manually annotated with polygonal masks. To increase robustness against variability in illumination, background complexity, and lodging morphology, data augmentation techniques, including random rotation, brightness and contrast adjustment, and blurring were employed, expanding the dataset to 3 852 images. The dataset was divided into training, validation, and testing subsets in a 75%, 15% and 10% split. Model training was conducted with 640×640 pixel inputs for 300 epochs using stochastic gradient descent (initial learning rate 0.01, momentum 0.937, weight decay 0.000 5) in PyTorch 2.0.0. Evaluation involved comparison with YOLACT, YOLOv7-seg, YOLOv8n-seg, and the original YOLOv11n-seg using precision (P), recall (R), mAP@0.5, mAP@0.5:0.95, parameter count, GFLOPs, and model size. [Results and Discussions] Ablation experiments demonstrated the incremental contributions of each architectural component. Substituting C3k2 with C3k2_SDW reduced parameters from 2.83 M to 2.14 M and computation from 10.2 to 8.1 GFLOPs, with slight performance improvements. Incorporating BiFPN further lowered complexity to 1.68 M parameters and 7.7 GFLOPs, accompanied by notable gains in detection metrics. The addition of MS-ECA attention achieved the highest performance, delivering P of 92.6%, R of 92.0%, and mAP@0.5 of 95.2%, corresponding to improvements of 3.7 percentage points in Precision and 2.1 percentage points in mAP@0.5 over the YOLOv11n-seg baseline, without increasing model size. Qualitative Grad-CAM visualizations revealed more precise focus on lodging regions and reduced false activations in upright stems and non-lodged soil areas. Generalization capability was further validated on the public WE3DS agricultural segmentation dataset, where the proposed model achieved average improvements of 4.3, 1.9, and 2.6 percentage points in precision, recall, and mAP@0.5, respectively, compared to the baseline. [Conclusions] The improved YOLOv11n-seg architecture achieves a superior balance between accuracy and efficiency for flax lodging segmentation by combining the C3k2_SDW_MS-ECA backbone, BiFPN with multi-stage semantic fusion in the neck, and output boundary refinement. This combination of high accuracy, lightweight design, and robust boundary delineation renders the model highly applicable to real-time, in-field deployment for intelligent lodging monitoring and precision agriculture. The results further suggest that the approach is transferable to broader agricultural segmentation tasks, providing a practical and scalable solution for modern smart farming applications.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Optimal Sampling Strategy for Soil Organic Matter Based on Hippopotamus Optimization Algorithm and Machine Learning

LIAN Zhenxiang, FEI Xufeng, REN Zhouqiao

Smart Agriculture 2026, 8 (2): 48-58. DOI: 10.12133/j.smartag.SA202508027

Abstract （962）

HTML （16）

PDF（pc）（1923KB）（26）

Save

[Objective] Soil quality is crucial for food security, ecosystem health, and sustainable development, but faces degradation due to intensive land use. Accurate soil quality assessment is therefore essential for informed land management and ecological protection. Machine learning has enhanced digital soil mapping (DSM) by improving modeling accuracy through multi-source data integration. Within DSM, soil sampling design is a foundational step that directly influences prediction accuracy, cost, and efficiency. An ideal scheme must balance mapping precision with economic and operational feasibility. This study focuses on soil organic matter (SOM), a core indicator of soil quality affecting fertility, carbon sequestration, and environmental regulation. Precisely mapping its spatial variability is vital for sustainable soil management. To address the need for efficient sampling, the aim of this research is to develop an optimal sampling design method for regional-scale SOM mapping, reduce sampling redundancy and cost while improving spatial prediction accuracy. [Methods] A sampling optimization framework was proposed that integrated intelligent optimization algorithms with a hybrid spatial interpolation model. The framework was built upon the hippopotamus optimization algorithm (HO) and incorporated the random forest residual kriging (RFRK) method to construct an optimal sampling strategy for the spatial prediction of SOM. At the initialization stage, a population of candidate solutions, referred to as "hippopotamuses", was randomly generated, with each individual representing a potential sampling layout. The HO was employed to select subsets of sampling points from the training sample pool, with each subset forming a candidate solution. Collectively, these solutions constituted the initial hippopotamus population. The study area was located in Lanxi city, Zhejiang province, where a total of 1 080 field-measured soil samples were collected. These samples were partitioned into a training set (n=756), a validation set (n=108), and a test set (n=216) at a ratio of 7:1:2. Environmental covariates, including terrain attributes, vegetation indices, and climate factors, were extracted from multi-source remote sensing datasets. Using these covariates, the HO optimized sampling schemes across varying densities and spatial configurations. The resulting designs were then evaluated using the RFRK model to assess their SOM prediction performance. This process enabled the identification of the optimal sampling density and spatial layout that balanced accuracy and cost-efficiency. [Results and Discussions] When the HO-RFRK framework was applied, the prediction accuracy of SOM improved significantly as sampling density increased from 0.5 to 2.3 points/km² (136-629 points). The root mean square error (RMSE) on the test set decreased from 6.04 to 5.11 g/kg, representing a reduction of approximately 15.4%. The lowest prediction errors were observed at a sampling density of 2.3 points/km², with the RMSE and mean absolute error (MAE) reaching their minimum values of 5.11 and 3.79 g/kg, respectively, beyond which further increases yielded only marginal gains, indicating diminishing returns. To assess the effectiveness of HO, its performance was compared with three established methods: conditioned Latin hypercube sampling (cLHS), genetic algorithm (GA), and particle swarm optimization (PSO). At lower densities (0.5－1.3 points/km²), all methods showed limited predictive power. However, at 1.4 points/km² (383 points), the HO method was the first to exceed predefined accuracy thresholds (coefficient of determination, R²>0.40; Lin's concordance correlation coefficient, LCCC>0.55), achieving R²=0.41 and LCCC=0.57, outperforming cLHS (R²=0.38, LCCC=0.53), GA (R²=0.39, LCCC=0.52), and PSO (R²=0.38, LCCC=0.51). Across the range of 1.4－2.3 points/km², HO consistently delivered superior results. At 2.3 points/km², the HO-RFRK combination achieved R²=0.49 and LCCC=0.63, surpassing cLHS, GA, and PSO in both metrics. [Conclusions] Based on the cultivated land of Lanxi city as a test case, a novel sampling optimization strategy was proposed based on the HO. First, the strategy successfully identified an optimal sampling density that maximizes prediction accuracy, as well as a lower, cost-effective density that maintains robust predictive performance with substantially reduced survey costs, defining a practical density range that balances precision and economic feasibility. Second, the RFRK model consistently demonstrated superior prediction accuracy compared to the standard random forest (RF) model across all tested sampling schemes, validating the effectiveness of the integrated HO-RFRK approach. In summary, this optimized strategy achieves high mapping accuracy with greater sampling efficiency, offering a scientifically grounded and practical methodology for reducing long-term soil monitoring costs. It provides a valuable reference for optimizing soil surveys in Lanxi city and other regions with similar environmental settings.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

YOLOv8n-SSND: An Improved Lightweight Model for Aerial Chenopodium Chenopodium quinoa Willd. Spike Target

WU Tingting, GUO Junrui, TAO Qiujie, CHEN Shihua, GUO Shanli

Smart Agriculture 2026, 8 (2): 59-71. DOI: 10.12133/j.smartag.SA202508021

Abstract （1037）

HTML （23）

PDF（pc）（3353KB）（42）

Save

[Objective] The Chenopodium quinoa panicle is a critical phenotypic indicator for estimating crop yield and evaluating the growth condition of Chenopodium quinoa plants. Accurate and efficient recognition of Chenopodium quinoa panicles in complex field environments is therefore of great significance for intelligent agriculture, yield prediction, and automatic crop management. However, unmanned aerial vehicle (UAV)-acquired field imagery often exhibits complex characteristics such as diverse panicle morphology, uneven illumination, overlapping occlusion, and background interference, et al., posing substantial challenges for conventional target detection algorithms. To address these issues, a lightweight target detection model, named YOLOv8n-SSND (YOLOv8n with Switchable Atrous Convolution, Slim Neck, and Deformable Attention) is proposed, and specifically optimized for UAV-based Chenopodium quinoa panicle identification to improve the detection accuracy and inference efficiency for Chenopodium quinoa panicles while maintaining low computational cost and real-time performance suitable for embedded UAV deployment. [Methods] The proposed model was constructed based on the YOLOv8n and YOLOv11n frameworks, and incorporated several improvements tailored for small-object agricultural detection tasks. To enhance the ability to capture multi-scale and high-dimensional semantic features, the switchable atrous convolution (SAC) module was embedded into the backbone network. This module dynamically adjusted its receptive field according to spatial context, enabling more precise extraction of local and global texture details of Chenopodium quinoa panicles. In order to reduce redundant parameters and maintain high computational efficiency, a slim-neck lightweight feature fusion layer was designed, which effectively strengthened the integration of shallow spatial information and deep semantic features, allowing the network to maintain high accuracy without increasing model complexity. Additionally, a deformable attention (DA) mechanism was introduced to enable adaptive focus on regions with rich panicle-related features while suppressing irrelevant background noise. This attention mechanism assigned dynamic weights across both spatial and channel dimensions, improving the model's robustness against occlusions, illumination variations, and complex field textures commonly encountered in UAV images. [Results and Discussions] Comprehensive field experiments were conducted using UAV images of Chenopodium quinoa plots collected under different environmental conditions and growth stages. The results demonstrated that the proposed YOLOv8n-SSND model achieved a mean average precision (mAP50) of 94.3%, showing a remarkable improvement over multiple baseline and comparative models. Specifically, compared with YOLOv11n-SSND, YOLOv11n, YOLOv12n, YOLOv7, YOLOv5s, single shot multibox detector (SSD), fast region-based convolutional neural network (Fast R-CNN) and YOLOv8n, the proposed model achieved improvements of 0.7, 0.9, 2.1, 1.4, 2.0, 23.1, 19.6 and 1.8 percentage points respectively (SSD and Fast R-CNN). In terms of computational efficiency, the inference speed reached 166.7 f/s, representing a 26.7% increase over the YOLOv8n baseline, which ensured real-time detection capability for UAV-mounted onboard processors. Moreover, the total operation count was reduced to 6.8 GFLOPs, reflecting a 16.0% reduction compared with the baseline model, thus demonstrating the improved efficiency of the proposed architecture. The experimental comparison also indicated that the integration of SAC enhanced the model's sensitivity to complex spatial patterns, while the DA module effectively improved feature selectivity and prevented overfitting to background textures. The Slim-Neck design contributed significantly to reducing parameter redundancy and facilitated smooth feature propagation across layers. [Conclusions] The YOLOv8n-SSND model effectively achieves a balance among detection accuracy, inference speed, and computational cost, making it well-suited for real-time UAV-based agricultural monitoring. The experimental outcomes confirm that the model not only provides high-precision detection of Chenopodium quinoa panicles but also offers superior inference efficiency with minimal computational resources. These characteristics make it a promising solution for UAV-deployed intelligent agricultural systems, where power and processing capacity are limited. Furthermore, the proposed method provides a technical foundation for large-scale and automated monitoring of Chenopodium quinoa growth, enabling accurate yield estimation, phenotypic analysis, and precision crop management.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Geographically Weighted Random Forest for County-scale Digital Mapping of Soil Organic Matter: A Case Study in the Central Shandong Mountains

ZHANG Shulin, CUI Liqin, LIU Jian, ZHANG Canting, WANG Hongjia, ZHANG Tingting, WANG Ailing

Smart Agriculture 2026, 8 (2): 72-85. DOI: 10.12133/j.smartag.SA202508020

Abstract （870）

HTML （21）

PDF（pc）（3174KB）（39）

Save

[Objective] Soil organic matter (SOM) is a fundamental indicator for evaluating soil fertility and soil quality. In mountainous counties characterized by complex terrain and pronounced environmental heterogeneity, SOM exhibits strong spatial variability even over short distances, which often results in limited prediction accuracy for conventional digital soil mapping (DSM) models. With the nationwide implementation of the Third National Soil Census, the demand for high-resolution and high-accuracy SOM mapping at the county scale has become increasingly urgent. Against this backdrop, Yiyuan county in Shandong province was selected as the study area to assess the applicability of the geographically weighted random forest (GWRF) model in SOM mapping within complex terrain regions. Furthermore, it sought to systematically compare the predictive performance of GWRF with several commonly used models, thereby providing technical support for soil resource surveys, census result compilation, and county-level land management. [Methods] The dataset consisting of 1 565 measured topsoil SOM samples was utilized, along with nineteen environmental variables representing five categories: topography, climate, vegetation, soil properties, and land use. Through correlation analysis and collinearity diagnostics, twelve key variables were retained for model construction. The GWRF model, which integrates localized spatial modeling with nonlinear machine-learning capability, was developed to generate high-resolution SOM predictions across the study area. An adaptive bandwidth strategy was employed, and the optimal bandwidth of 500 was determined. Grid search combined with cross-validation was used to identify the optimal mtry value of 4 for the random forest component. In addition to GWRF, four reference models were constructed for comparison: ordinary kriging (OK), multiple linear regression (MLR), geographically weighted regression (GWR), and random forest (RF). Model performance was evaluated using two commonly adopted accuracy metrics: the coefficient of determination (R²) and root-mean-square error (RMSE). [Results and Discussions] Overall, SOM levels in Yiyuan county were relatively low, with a mean value of 15.62 g/kg. The spatial variation was moderate and exhibited a clear pattern: SOM values were higher in the central area and lower in the northeastern and southwestern areas. Considerable differences were observed in prediction accuracy among the five models. The GWRF model achieved the best overall performance, with an R² of 0.48 and an RMSE of 5.12 g/kg. This accuracy clearly surpassed that of RF (R²=0.41) and GWR (R²=0.35), and its advantage over MLR and OK was even more pronounced. A paired-sample t-test further confirmed that the accuracy improvements of GWRF over the other four models were statistically significant, supporting the robustness and reliability of the model's enhanced performance. According to the mapping results, the OK model produced an excessively smooth surface, making it difficult to reveal local details. While the MLR and GWR models could characterize certain environmental effects, they exhibited significant biases such as underestimation of high values and overestimation of low values. In contrast, the GWRF model performed prominently in capturing both global trends and local subtle variations. The analysis of variable importance showed that soil type, annual evapotranspiration, slope, and sand content were the most influential factors governing SOM distribution in the study area. Moreover, their spatially varying importance revealed notable heterogeneity. [Conclusions] This study demonstrated that the GWRF model possesses significant advantages in county-scale SOM digital mapping within mountainous areas. Its prediction accuracy markedly exceeded that of RF and conventional linear models, owing to its ability to simultaneously capture nonlinear environmental relationships and localized spatial variations. The enhanced mapping precision and improved representation of spatial details highlight the strong potential of GWRF for applications requiring high-accuracy soil information. GWRF is well-suited for SOM prediction under complex terrain conditions and can serve as an effective technical tool for county-level soil property estimation. Future research may incorporate human-activity-related variables, employ localized variable-selection strategies within the GWRF framework to further refine model performance, and explore the application potential of more advanced deep learning models in soil property mapping.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

A Bi-LSTM Prediction Method for Apple First Flowering Date Based on Enhanced Time-Series Temperature Features

LIU Enqi, LIU Miao, WANG Tuo, ZHU Yaohui, CHEN Riqiang, XU Bo, GAO Meiling, ZHANG Jing, YANG Yun, YANG Guijun

Smart Agriculture 2026, 8 (2): 86-97. DOI: 10.12133/j.smartag.SA202510026

Abstract （861）

HTML （16）

PDF（pc）（4977KB）（38）

Save

[Objective] The first flowering date of apples is a key phenological stage in the annual growth cycle of fruit trees. Its occurrence timing is directly associated with pollination efficiency, fruit set rate, and subsequent fruit development, and it also serves as an important basis for orchard management practices, including flower and fruit thinning, pest and disease control, as well as early risk warning and emergency management for low-temperature frost events during the flowering period. Existing studies still have room for improvement in the fine-scale extraction of temperature time-series information and in the representation of model adaptability across different spatial locations. Therefore, the purpose of this research is to develop a prediction method for the first flowering date of apples that can effectively characterize time-varying temperature patterns and achieve regional adaptability, thereby providing more reliable technical support for refined orchard management and disaster prevention. [Methods] A deep learning-based forecasting framework for predicting the first flowering date of apples was developed based on observation sites in Luochuan county, Shaanxi province. First, daily near-surface air temperature (NSAT) data from 2019 to 2021 were collected for the period from apple harvest to the subsequent flowering season in the study area, including daily maximum, mean, and minimum temperatures. In addition, elevation, latitude, and longitude were introduced as static geographic factors, forming a combined input composed of dynamic temperature sequences and static spatial attributes. Second, in terms of the model design, a bidirectional long short-term memory network (Bi-LSTM) was employed as the temporal encoder to learn bidirectional dependencies within the temperature time series. On this basis, a customized multi-head attention (MHA) mechanism was integrated, consisting of a local dependency head, a global trend head, and a cumulative feature head, which were designed to represent short-term pre-flowering temperature fluctuations, overall temperature trends, and cumulative temperature effects, respectively. This configuration enhanced the extraction of time-varying information across multiple temporal scales. The attention outputs were then fused with the static geographic factors, and the predicted first flowering date was generated through a regression layer, enabling regionally adaptive prediction. To ensure comparability of results, LSTM and Bi-LSTM models were simultaneously constructed as baseline models using identical data preprocessing and training procedures.Third, Bayesian optimization was applied for automatic hyperparameter tuning, during which key parameters, including learning rate, number of network layers, number of hidden units, regularization terms, and optimizers, were systematically searched, and the optimal configuration was selected based on validation performance. Finally, a cross-year validation strategy was adopted to evaluate model generalization ability: Data from 2019 to 2021 were used as the modeling dataset (training and validation), while the observed first flowering date in 2022 served as an independent test dataset. The predictive performance of all models was evaluated using three widely recognized metrics: root mean square error (RMSE), mean absolute error (MAE), and correlation coefficient (R). [Results and Discussions] The proposed model achieved an RMSE of 1.34 d, a MAE of 1.13 d, and the R of 0.84 on the test dataset, with most prediction errors concentrated within a range of 0－2 d. Validation results indicated that the proposed approach was capable of providing stable predictions approximately 15－20 d in advance within the study area. Further comparative analysis demonstrated that the Bi-LSTM architecture more effectively exploited both forward and backward dependencies in the pre-flowering temperature time series, thereby offering a more stable temporal representation for regression-based prediction of the first flowering date. Building upon this structure, the introduction of three attention heads: the local dependency head, the global trend head, and the cumulative feature head, enabled the model to more explicitly distinguish and utilize short-term fluctuations, stage-wise trends, and cumulative temperature effects. This targeted extraction of multi-scale time-varying information contributed to reduced prediction errors and improved overall prediction accuracy. Ablation experiments involving static geographic factors further verified the necessity of the spatial adaptability component. When the elevation was removed, the RMSE increased from 1.34 d to 1.45 d. Removing latitude and longitude led to a larger increase in RMSE to 2.54 d, and when both elevation and geographic coordinates were excluded, the RMSE further rose to 2.69 d accompanied by a decrease in correlation. These results indicated that geographic factors provided effective spatial constraints, which supported the learning of location-specific phenological responses across different sampling sites. In addition, spatial prediction maps revealed that the first flowering date in the study area exhibited a gradient distribution with respect to elevation to a certain extent. This spatial pattern was consistent with the modeling rationale of incorporating geographic factors into a unified prediction framework. [Conclusions] This study proposes a deep learning-based prediction method for the first flowering date of apples that integrates multi-dimensional temperature features, a multi-head attention mechanism, and geographic factors. The proposed method achieves relatively high prediction accuracy in cross-year forecasting and enables spatially adaptive prediction of the first flowering date of apples. These findings provide a new data-driven technical pathway for refined prediction of apple flowering phenology and offer important technical support for orchard flowering management, frost damage prevention, and agricultural production decision-making.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

CAGE-YOLO: A Dense Small Object Detection Model for Aquaculture Net Cages Based on Remote Sensing Images

ZHANG Wenbo, JIANG Yijue, SONG Wei, HE Qi, ZHANG Wenbo

Smart Agriculture 2026, 8 (2): 98-117. DOI: 10.12133/j.smartag.SA202508023

Abstract （1228）

HTML （28）

PDF（pc）（3488KB）（45）

Save

[Objective] Detecting dense and small aquaculture net cages in complex backgrounds is difficult, the purpose of this study is to build a specialized dataset and design a targeted detection model that enhances recognition accuracy and robustness for practical aquaculture management. [Methods] A dataset of aquaculture net cages was constructed using high-resolution remote sensing imagery collected from seven representative farming regions (Australia, Canada, Chile, Croatia, Greece, China, and the Faroe Islands), and Cage-YOLO, a deep learning model based on YOLOv5, was proposed for detecting dense and small aquaculture net cages. First, an adaptive dense perception algorithm was introduced, which automatically selects and generates feature maps that reflect the high-density distribution of small aquaculture net cages. Second, an enhanced module based on spatial pyramid pooling fast was integrated to effectively reduce background noise interference and improve global feature extraction capabilities. Finally, a mixed attention block was incorporated to further enhance the model's perception of dense and small objects. [Results and Discussions] Experimental results showed that the proposed Cage-YOLO achieved improvements over the original YOLOv5 in terms of precision, recall, and mean average precision by 5.6, 21.8, and 17.4 percentage points, respectively. The model size was maintained at 16.9 MB, demonstrating both strong performance and deployment advantages. [Conclusions] This study provides a new approach for dense and small object detection and offers technical support for the intelligent management of marine cage aquaculture.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Cross-Modal Attention for Multi-Source Remote Sensing Crop Classification under Cloud Occlusion and Complex Field Scenarios

WU Chenxu, ZUO Haolong, LI Gang

Smart Agriculture 2026, 8 (2): 118-132. DOI: 10.12133/j.smartag.SA202510010

Abstract （1016）

HTML （14）

PDF（pc）（8282KB）（40）

Save

[Objective] Accurate and timely crop mapping is fundamental for agricultural management, yield forecasting, and food security assessment. However, in mountainous and hilly regions characterized by frequent cloud cover and highly fragmented farmland, crop classification methods relying solely on optical remote sensing data are severely constrained. Persistent cloud contamination introduces data gaps and temporal inconsistencies in optical image time series, significantly degrading classification accuracy and robustness. To address these limitations, a robust and adaptive deep learning framework is developed capable of effectively integrating multi-modal remote sensing data. The primary objective is to enhance crop classification accuracy and stability under complex conditions where optical observations are scarce or unreliable, thereby supporting reliable agricultural monitoring in cloudy and fragmented landscapes. [Methods] A novel deep neural network architecture named 3D convolutional neural network based on attention mechanism (Attention-3DCNN) was proposed, designed to jointly exploit multi-temporal optical and synthetic aperture radar (SAR) observations. The model integrated Sentinel-2 multispectral time-series imagery with weather-insensitive Sentinel-1 SAR data through a dedicated cross-modal fusion strategy driven by a triple-attention mechanism. The network adopted a dual-branch feature extraction architecture. For the Sentinel-2 data, a hybrid module combining three-dimensional and two-dimensional convolutional neural networks (3D-CNN and 2D-CNN) was employed to capture discriminative spatiotemporal features and crop phenological dynamics across the growing season. This design enabled effective modeling of the spectral-temporal interactions inherent in crop development. For the Sentinel-1 SAR data, depthwise separable convolutions were utilized to efficiently extract spatial and textural features related to crop structure and surface scattering characteristics while reducing computational complexity. Features extracted from both modalities were subsequently integrated using a custom-designed attention-based fusion module. This module consisted of three complementary attention mechanisms: channel attention, temporal attention, and spatial attention. Residual connections were incorporated throughout the network to facilitate stable training and effective gradient propagation. The proposed model was evaluated on two datasets to assess both its performance and generalizability. The first was the publicly available panoptic agricultural satellite time series (PASTIS) benchmark dataset from France, which contained dense time-series observations and multiple crop classes. The second was a real-world dataset constructed for Yishui county, Shandong province, China, which was characterized by high cloud frequency (approximately 33%), highly fragmented farmland (average parcel size < 0.5 hm²), and a relatively simple crop rotation system. Comparative experiments were conducted against several state-of-the-art models, including 3D-ConvSTAR, UNet++, Self-Attention 3D, CNN-LSTM dual-stream network, and TGF-Net. Ablation studies were also performed to quantify the contribution of each attention component. [Results and Discussions] Experimental results demonstrated that Attention-3DCNN consistently outperformed all baseline methods on both datasets. On the PASTIS benchmark, the model achieved an overall accuracy (OA) of 97.5%, confirming its strong classification capability under favorable observation conditions. On the more challenging Yishui county dataset, Attention-3DCNN attained an OA of 93%, outperforming the other comparison models. Ablation experiments confirmed the effectiveness of the proposed triple-attention mechanism, as removing any attention component resulted in a clear reduction in classification performance. Under heavy cloud coverage, Attention-3DCNN exhibited the smallest accuracy degradation, with an OA drop of only 3.6 percentage points, indicating its ability to adaptively rely on SAR information when optical data quality deteriorated. In regions with highly fragmented farmland, the proposed model also maintained the highest accuracy and the smallest performance decline (2.8 percentage points), benefiting from the spatial attention mechanism. Moreover, attention visualization provided meaningful interpretability. Temporal attention peaks aligned with key crop phenological stages, while channel attention highlighted spectrally and physically informative optical bands and SAR polarizations, which was consistent with established agronomic and remote sensing knowledge. [Conclusions] This study presents the Attention-3DCNN model for accurate and robust crop classification in regions affected by persistent cloud cover and fragmented agricultural landscapes. By fusing Sentinel-2 optical and Sentinel-1 SAR time-series data through a channel-temporal-spatial triple-attention mechanism, the proposed framework enables adaptive integration of complementary multi-modal information. The model achieves outstanding performance on both benchmark and real-world datasets, demonstrates strong robustness under adverse conditions, and offers enhanced interpretability. Overall, the proposed approach provides a reliable and practical solution for crop mapping in complex agricultural environments.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

An Improved YOLOv10-Based Tomato Ripeness Detection Algorithm with LAMP Channel Pruning

ZHAO Licheng, LU Xinyu, WU Qian, REN Ni, ZHOU Lingli, CHENG Yawen, HU Anqi, QI Chao

Smart Agriculture 2026, 8 (2): 133-146. DOI: 10.12133/j.smartag.SA202507045

Abstract （594）

HTML （24）

PDF（pc）（3753KB）（50）

Save

[Objective] As a major crop in protected horticulture, cluster tomatoes grow in clusters with dense overlapping fruits. In greenhouse environments, light conditions are complex and variable, and the fruit color transitions continuously from green to red across different ripening stages, showing continuous gradation characteristics. These factors result in the low efficiency and strong subjectivity of traditional manual recognition methods. Meanwhile, deep learning-based detection models often suffer from decreased detection accuracy, large localization errors, and slow inference speed when facing complex backgrounds and color interference, making it difficult to meet the dual requirements of real-time performance and high precision in practical applications. Therefore, to meet the practical application requirements of high accuracy, high real-time performance, and strong robustness for cluster tomato ripeness detection, this paper proposes a lightweight target detection model for cluster tomato ripeness, namely LampCT-YOLO (Cluster Tomato YOLO with LAMP pruning), which is based on improved YOLOv10. Through structural optimization and lightweight transformation of the baseline model, the detection accuracy, inference speed, and robustness are effectively improved, providing a novel technical solution for cluster tomato ripeness detection. [Methods] Taking YOLOv10 as the baseline model, first, the issue of insufficient feature extraction capability in complex scenarios was addressed by introducing the SegNeXt attention mechanism into the backbone network. By adaptively adjusting attention weights and calculating the correlation matrix between different feature channels, the mechanism automatically identified color channels strongly associated with the three ripeness levels of cluster tomatoes and assigned them higher attention weights, while suppressing feature responses from irrelevant background channels such as greenhouse frames, soil, and irrigation pipes. To achieve lightweight deployment of the model and meet the real-time detection requirements of edge devices, a gradient-based global channel importance method—LAMP channel pruning technology—was introduced after model training. The core principle of this technology was to evaluate the contribution of each channel to the model's detection performance by calculating the gradient magnitude of channels in each network layer, thereby eliminating redundant channels. This significantly reduced the model size and computational complexity while effectively maintaining the model's high detection performance for the three-category ripeness classification of cluster tomatoes. [Results and Discussions] Experiments showed that under the environment of NVIDIA A100 graphics card, for 240 cluster tomato images in the test set, the LampCT-YOLO model exhibited excellent detection performance. The mean average precision at 50 intersection over union (mAP50) for the early ripe, mid-ripe, and late ripe stages of cluster tomatoes was 84.6%, 89.5%, and 88.4%, respectively, which represented increases of 5.5, 7.7, and 0.9 percentage points compared with YOLOv10. The average mAP50 for the three ripeness categories of cluster tomatoes reached 87.6%, a 4.7 percentage points improvement over YOLOv10, demonstrating outstanding performance in both detection accuracy and stability. In addition, the model was found to maintain high recognition accuracy when facing variations in light intensity, fruit occlusion ratio, and background complexity, indicating good robustness and environmental adaptability. Regarding the lightweight effect, after applying the LAMP channel pruning technology, the number of model parameters and computational complexity were reduced by 63.07% and 50.06%, respectively, while the inference speed was improved by 23.1%. This effectively met the requirements of edge computing devices for real-time detection and low power consumption, alleviating the trade-off between model accuracy and inference speed. To verify the practical application value of the LampCT-YOLO model, the model was deployed on a self-developed fruit and vegetable inspection robot, which conducted field tests on 456 clusters of tomatoes in a real greenhouse environment. The results showed that the inspection robot successfully identified 78, 61, and 248 clusters of early ripe, mid-ripe, and late ripe cluster tomatoes, respectively, with detection accuracies of 84.8%, 87.1%, and 84.4%, and an average accuracy of 85.4%. Meanwhile, there were 5, 7, and 10 false detections, as well as 9, 2, and 36 missed detections for the early ripe, mid-ripe, and late ripe stages respectively, which to a certain extent reflected the practical application potential of the model. [Conclusions] The optimized LampCT-YOLO model not only significantly improves the recognition accuracy of cluster tomatoes at different ripening stages but also greatly reduces the model complexity, successfully achieving efficient deployment in resource-constrained scenarios. This model effectively balances the dual requirements of detection accuracy and real-time performance for inspection robots, and further constructs a reusable technical framework for the ripeness detection of protected horticultural fruits and vegetables. It provides strong support for the transformation of protected agriculture from labor-intensive to technology-intensive, and injects key innovative impetus into the large-scale and diversified implementation of smart agriculture.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

A Lightweight Method for Pear Surface Defect Detection Based on Improved Mamba-YOLO Architecture

XIU Xianchao, FEI Shiqi, HUANG Wenqian, LI Nan, MIAO Zhonghua

Smart Agriculture 2026, 8 (2): 147-157. DOI: 10.12133/j.smartag.SA202508022

Abstract （313）

HTML （15）

PDF（pc）（1523KB）（35）

Save

[Objective] Pears are a common fruit rich in vitamins and minerals. Traditional pear grading primarily relies on manual inspection, which is not only laborious but also susceptible to subjective factors, leading to unstable and inaccurate results. Furthermore, manual operations may cause varying degrees of physical damage to pears, affecting their appearance and market value. Therefore, developing an automated, efficient, and reliable pear grading technology has become an urgent demand in the industry. To address the current problem of poor detection accuracy caused by the small scale of surface defects in Dangshan pears, a lightweight high-precision model was proposed based on an improved Mamba-YOLO architecture, aiming to balance detection accuracy and efficiency. [Methods] The dataset comprised 1 000 images, which were partitioned into training, validation, and test sets in an 8:1:1 ratio. The following improvements were made to the network architecture. Firstly, a dynamic upsampling (Dysample) module was adopted. Compared to the existing upsampling module in Mamba-YOLO, the Dysample module featured fewer parameters and floating-point operations (FLOPs). Its design eliminated complex dynamic convolution kernels, requiring only a small number of linear layers and grouping operations, thereby preserving computational efficiency while enhancing the retention of defect details. Secondly, regarding pear surface defect detection, defects often exhibited high-frequency local features, whereas traditional convolutional neural networks (CNNs) suffer from insufficient feature capture and imbalanced frequency response. As the dilation rate increased, the frequency response of the convolution kernel decreased and its bandwidth narrowed, consequently limiting its ability to process high-frequency information. Therefore, a frequency-adaptive dilated convolution (FADC) module was proposed, which dynamically adjusted the convolution kernel size, enabling the network to adaptively select matching kernels based on local input features. Smaller kernels were used in high-frequency regions, and larger kernels in low-frequency regions, thereby achieving collaborative optimization of multi-band features and enhancing the ability to extract defect features. Finally, considering that using only single-scale depthwise convolutions to capture local features might lead to insufficient perception of input feature information, and that traditional gating mechanisms may lack adequate global context information modeling, the squeeze-and-excitation module was fused with a channel mixer based on the convolutional gated linear unit (CGLU). This combination was extended into a multi-scale version termed MS-CGLU. By incorporating convolutional kernels of different sizes to extract multi-scale features, followed by weighted fusion, stronger feature representation was achieved. [Results and Discussions] The proposed method was rigorously evaluated on the dangshan pear test set. Ablation experiments demonstrated that introducing the CGLU, FADC, and Dysample enhanced detection performance, confirming the effectiveness of these modules. Compared to YOLOv8n, Gold-YOLO-N, and YOLOv12n, the mean average precision (mAP) was higher by 4.7, 5.3, and 6.3 percentage points, respectively. Compared to the baseline Mamba-YOLO-T, the mAP increased by 3.4 percentage points and the frames per second improved by 10.8 percentage points. Furthermore, in comparative experiments with larger-scale models from the same Mamba-YOLO series, the proposed algorithm still demonstrated significant advantages, i.e., its parameter count was only 41.7% of Mamba-YOLO-B and 15.7% of Mamba-YOLO-L, and its FLOPs was merely 57.1% and 18.1% of the respective models, yet it achieved increases in mAP@0.5 of 3.2% and 1.4%, and increases in mAP@0.5:0.95 of 3.1% and 2.6%, respectively. [Conclusions] This research developed a high-precision and lightweight algorithm for detecting surface defects on Dangshan pears. It achieved a superior balance between detection accuracy and inference speed, significantly outperforming relevant lightweight benchmarks and even larger models within its own family in terms of efficiency. This work can provide reliable algorithmic support for lightweight detection research of pear surface defects.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

CD-YOLO: A Method for Detecting Carrot Seedlings in Field Based on Improved YOLOv11s

LIU Haoran, WANG Yu, ZHAO Xueguan, WU Huarui, FU Hao, PANG Shujie, ZHAI Changyuan

Smart Agriculture 2026, 8 (2): 158-174. DOI: 10.12133/j.smartag.SA202511008

Abstract （349）

HTML （30）

PDF（pc）（9732KB）（31）

Save

[Objective] In field environments under natural conditions, leaf occlusion and mutual plant shading pose significant challenges to the accurate identification of carrot seedlings. Furthermore, practical agricultural applications often rely on edge devices with limited computational power, necessitating a detection model that combines lightweight design, high accuracy, and robust anti-occlusion capability. The purpose of this research is to develop a robust recognition method for carrot seedlings suitable for complex field conditions, thereby enhancing the accuracy and efficiency of seedling emergence statistics in automated seedling raising processes and providing reliable technical support for precise farm management. [Methods] The CD-YOLO (Carrot Detection-You Only Look Once), a lightweight detection model was proposed based on an improved YOLOv11s. First, to reduce model complexity, several standard convolutions in the backbone network were replaced with depthwise separable convolutions (DWConv), thereby decreasing floating-point operations (FLOPs) and the number of parameters, establishing a lightweight foundation for edge deployment. Secondly, the efficient multi scale attention (EMA) mechanism was embedded into the critical feature extraction module C3k2, constructing a C3k2_EMA module. This module enhanced dynamic perception of local key features and reconstructed cross-scale contextual dependencies broken by occlusion through its parallel multi-branch structure, effectively suppressing background and occlusion noise. Finally, the DynamicHead detection head was introduced. Leveraging its scale-aware and spatial-aware mechanisms, it achieved a dynamic fusion of multi-level features and adaptive weight adjustment, further improving the model's decision-making robustness in complex scenes. To comprehensively evaluate model performance, a carrot seedling dataset covering various field scenarios was independently constructed. Through offline data augmentation, the original 1 274 images were expanded to 4 796, which were then split into training, validation, and test sets in an 8:1:1 ratio. Meanwhile, to systematically quantify the model's anti-occlusion performance, an occlusion severity assessment criterion based on the overlapping area of bounding boxes was proposed. Targets were categorized into three occlusion levels: mild, moderate, and severe. Based on this, a dedicated "Occlusion Test Subset" was separated from the main test set, providing an objective and reproducible benchmark for evaluating the model's anti-occlusion capability. [Results and Discussions] Experimental results on the custom dataset demonstrated that CD-YOLO comprehensively improved detection performance while maintaining its lightweight characteristics. Compared to the baseline model YOLOv11s, CD-YOLO reduced computational load by 6.2 GFLOPs (a 28.8% decrease), decreased model size by 4.8 MB (a 25.0% reduction), improved single-image inference speed by 4.7 ms, reaching 9.6 ms. Concurrently, precision, recall, and mean average precision (mAP_0.5) increased by 3.0, 1.5, and 2.4 percentage points, respectively, ultimately reaching 81.2%, 76.4%, and 84.0%. In comparisons with other lightweight backbone networks like MobileNetv3 and ShuffleNetv2, CD-YOLO consistently outperformed them on the accuracy-speed comprehensive metric, validating the effectiveness of its improvement strategies. In occlusion performance tests, the missed detection rate of CD-YOLO on the occlusion test subset was 13.4%, a 5.7 percentage points decrease compared to YOLOv11s. Its mAP_0.5 on the occlusion subset reached 80.6%, a 5.1 percentage points improvement over the baseline, whereas the improvement on the regular subset was 1.8 percentage points, proving the model's enhanced efficacy in occlusion scenarios. After deploying the model on an NVIDIA Jetson Orin NX edge device and accelerating it with TensorRT, the inference frame rate increased to 32.5 f/s. On random test images, CD-YOLO achieved missed detection and false detection rates of 5.1% and 2.7%, respectively, representing decreases of 7.7% and 2.6% compared to YOLOv11s, demonstrating promising practical application potential. Ablation studies and feature map visualizations further indicated that DWConv, C3k2_EMA, and DynamicHead formed a synergistic optimization loop: DWConv achieved computational compression, freeing up computational budget for subsequent modules; C3k2_EMA enhanced local perception and contextual reconstruction of occluded targets during the feature extraction stage; and DynamicHead performed dynamic fusion of multi-scale features at the decision-making end. Together, they ensured high-precision detection of incomplete targets under limited computational resources. [Conclusions] Through the synergistic design of "lightweighting, feature enhancement, and dynamic fusion", the CD-YOLO model achieved an excellent balance between computational efficiency, detection accuracy, and anti-occlusion capability. The model not only significantly reduced reliance on the computational power of edge devices but also effectively improved robustness and adaptability in complex field environments through structured attention and dynamic fusion mechanisms.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Field Maize Yield Prediction Model Based on Causal Inference and Machine Learningin Agricultural Fields

WANG Yi, CUI Xitong, WANG Chen, XIONG Baowei, SHAO Guomin, WANG Wanying, CAO Pei, HAN Wenting

Smart Agriculture 2026, 8 (2): 175-187. DOI: 10.12133/j.smartag.SA202506027

Abstract （268）

HTML （16）

PDF（pc）（1724KB）（35）

Save

[Objective] Maize is one of the most important staple crops in the world and serves as a cornerstone of food security and agricultural sustainability. Accurate and timely prediction of maize yield is essential for optimizing agricultural management practices, supporting market regulation, and guiding policy decisions related to food supply and climate adaptation. In recent years, data-driven yield prediction methods based on machine learning and deep learning have achieved notable improvements in predictive accuracy. However, most existing approaches primarily rely on statistical correlations among variables and often treat influencing factors as independent predictors, without explicitly addressing the complex causal mechanisms and time-lagged interactions that govern crop growth processes. This limitation may lead to reduced model interpretability and compromised robustness under changing environmental conditions. To address these challenges, a novel maize yield prediction framework that integrates causal inference with a hybrid deep learning model was proposed, aiming to improve both predictive performance and mechanistic understanding. [Methods] Multi-source heterogeneous datasets collected across the maize growing season were utilized, including remote sensing-derived vegetation indices, meteorological variables (such as temperature and precipitation), soil profile moisture measurements at multiple depths, and crop observation data corresponding to key phenological stages. First, the Peter-Clark and momentary conditional independence (PCMCI) causal discovery algorithm was applied to systematically identify causal relationships between maize yield and its potential driving factors. The PCMCI method enables the detection of both contemporaneous and time-lagged causal links while effectively controlling for confounding effects in high-dimensional time series data. Through this process, the causal structure of yield formation was explicitly characterized, and key variables with statistically significant causal impacts were selected as inputs for the prediction model. Subsequently, a hybrid moving average, convolutional neural network-long short-term memory (MA-CNN-LSTM) model was constructed to capture the complex spatiotemporal patterns in the causally screened input variables. Specifically, a moving average module was employed as a preprocessing step to suppress high-frequency noise and enhance signal stability. A CNN was then used to extract latent correlation features among multiple variables, reflecting their joint influence on yield formation. Finally, an LSTM network was adopted to model temporal dependencies and cumulative effects across the growing season, enabling effective representation of dynamic yield responses. [Results and Discussions] The causal analysis revealed that soil moisture at depths of 10 cm and 50 cm exerted a significant positive influence on maize yield (P < 0.01), with deeper soil moisture showing a stronger and more persistent time-lagged effect. This finding highlighted the critical role of subsurface water availability in sustaining crop growth during later developmental stages. In addition, vegetation indiced such as the modified chlorophyll absorption ratio index and the normalized difference vegetation index exhibited significant short-term causal relationships with yield during the mid-growth stage of maize, indicating their sensitivity to canopy structure and photosynthetic activity during this period. Comparative experiments conducted against traditional statistical models and conventional machine learning approaches demonstrated that the proposed PCMCI-MA-CNN-LSTM framework consistently achieved superior predictive performance. On the test dataset, the coefficient of determination (R²) reached 0.955, while the mean absolute error (MAE) and root mean square error (RMSE) were reduced to 1.201 kg/mu and 1.474 kg/mu (1 hm²=15 mu). These results indicated that incorporating causal variable selection effectively enhances model accuracy and stability by reducing redundant and spurious correlations. [Conclusions] The results confirm that incorporating causal analysis into yield modeling provides a robust basis for identifying key driving variables and effectively enhances the accuracy and interpretability of maize yield prediction. The proposed framework offers a promising approach for precision agriculture and decision support in crop yield forecasting, particularly under complex and dynamic agro-environmental conditions.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

CGG-Based Segmentation and Counting of Densely Distributed Rice Seeds in Seedling Trays

OUYANG Meng, ZOU Rong, CHEN Jin, LI Yaoming, CHEN Yuhang, YAN Hao

Smart Agriculture 2026, 8 (2): 188-199. DOI: 10.12133/j.smartag.SA202507030

Abstract （335）

HTML （12）

PDF（pc）（4184KB）（18）

Save

[Objective] The precise quantification of rice seeds within individual cavities of seedling trays constitutes a critical operational parameter for optimizing seeding efficiency and fine-tuning the performance of air-vibration precision seeders. Achieving high accuracy in this task directly impacts resource utilization, seedling uniformity, and ultimately crop yield. However, the operational environment presents significant challenges, including complex backgrounds, seed overlap, variations in lighting and seed orientation, and the inherent difficulty of distinguishing individual seeds within dense clusters. These factors often lead to suboptimal performance in existing automated detection systems, manifesting as low detection accuracy and an inability to achieve robust, precise instance segmentation of individual rice seeds. To address these persistent limitations and advance the state-of-the-art in precision seeding monitoring, an integrated framework for rice seed instance segmentation was proposed. The core innovation lies in the synergistic combination of a cross-modal grounding generation (CGG) network with a pretrained model, which is designed to leverage complementary information from visual and textual domains. [Methods] The proposed methodology fundamentally aimed to bridge the gap between visual perception and semantic understanding within the specific context of rice seed detection. The CGG-pretrained model framework achieved this through deep joint alignment of visual features extracted from seedling tray images and textual features derived from contextual knowledge. This cross-modal grounding enabled collaborative learning, where the visual processing stream (handling object localization and pixel-level segmentation) was continuously informed and refined by the semantic understanding stream (interpreting context and relationships). Specifically, the visual backbone network processes input imagery to generate feature maps, while the pretrained language model component, which utilized contextual embeddings, generated semantically rich textual representations. The CGG module acted as the fusion engine, establishing explicit correspondences between specific regions in the image (potential seeds or clusters) and relevant semantic concepts or descriptors provided by the pretrained model. This bidirectional interaction significantly enhanced the model's ability to disambiguate overlapping seeds, resolved occlusions, and accurately delineated individual seed boundaries under challenging conditions. Key technical innovations validated through rigorous ablation studies include: (1) The strategic use of the bootstrapping language-image pre-training (BLIP) model for generating high-quality pseudo-labels from unlabeled or weakly labeled image data, facilitating more effective semi-supervised learning and reducing annotation burden, and (2) the application of bidirectional encoder representations from transformers (BERT)-based word embed to capture deep semantic relationships and contextual nuances within textual descriptors related to seeds and seeding environments. [Results and Discussions] The ablation experiments demonstrated a pronounced synergistic effect when the core improvements were combined, resulting in a segmentation accuracy improvement exceeding 3 percentage points compared to the baseline model that lacking the integration. Comprehensive experimental evaluation demonstrated the superior performance of the proposed CGG model against established benchmarks. Under the standard intersection over union (IoU) threshold of 0.5, the model achieved a mean average precision (mAP) of 90.7% for bounding box detection (denoted as mAP50^bb for detection) and an outstanding 91.4% mAP for instance segmentation (denoted as mAP50^seg for segmentation). These results represented a statistically significant improvement over leading contemporary models, including region-based convolutional neural network (Mask R-CNN) and Mask2Former, which highlighted the efficacy of the cross-modal grounding approach in accurately localizing and segmenting individual rice seeds. Further validation within realistic seeding trial scenarios, which involved direct comparison with meticulous manual annotations, confirmed the model's practical robustness. The CGG model attained the highest accuracy in two critical operational metrics: (1) Precision in segmenting individual seed instances (single-seed segmentation accuracy), and (2) accuracy in determining the exact seed count per cavity, and it achieved an average accuracy of 88% for per-cavity quantification. Moreover, the model exhibited superior performance in minimizing estimation errors for cavity seed counts, as evidenced by its significantly lower error metrics: a root mean square error (RMSE) of 16.8 seeds, a mean absolute error (MAE) of 13.7 seeds, and a mean absolute percentage error (MAPE) of 2.46%. These error values were markedly lower than those recorded by the comparison models, which underscored the CGG model's enhanced reliability in practical counting tasks. The discussion contextualized these results and attributed the performance gains to the model's ability to leverage semantic context to resolve ambiguities inherent in visual-only approaches, particularly in dense and overlapping seed scenarios common in precision seeding trays. [Conclusions] The developed CGG-pretrained model integration presents a significant advancement in automated monitoring for precision rice seeding. The model successfully addresses the core challenges of low detection accuracy and imprecise instance segmentation for seeds in complex environments. Its high accuracy in both individual seed segmentation and per-cavity seed count quantification, coupled with low error rates, demonstrates strong potential for practical deployment. Importantly, the model enables real-time detection of rice seeds during the image analysis stage, this functionality provides a quantifiable, data-driven basis for making immediate operational decisions, most notably enabling the targeted precision reseeding of empty or under-seeded cavities identified during the seeding process. By ensuring optimal seed placement and density from the outset, the technology contributes directly to improved resource efficiency (reducing seed waste), enhanced seedling uniformity, and potentially higher crop yields. Future work will focus on further optimizing inference speed for higher-throughput seeding lines and exploring generalization to other crop types and seeding mechanisms.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

DEMA-3D TSP: An Enhanced Reinforcement Learning with DEMA Attention in Sequence Optimization for Safflower Picking Robot

LI Menghao, WANG Xiaorong, LIU Zihe, DUAN Mengyu, JIN Zhengyang

Smart Agriculture 2026, 8 (2): 200-219. DOI: 10.12133/j.smartag.SA202506004

Abstract （263）

HTML （18）

PDF（pc）（2320KB）（19）

Save

[Objective] There are several critical challenges in automated safflower harvesting, particularly the inefficiencies in path planning, suboptimal route quality, and limited decision-making capability under dynamic and complex environments. To solve these issues, the problem was formulated as a three-dimensional traveling salesman problem and an enhanced reinforcement learning model named actor-critic reinforcement learning pointer network (AC-RL-PtrNet) was proposed, specifically designed for deployment on intelligent safflower picking robots in agricultural settings. [Methods] First, to address the inherent limitations of conventional attention mechanisms in dynamic environments with complex spatial structures, an enhanced attention module was proposed based on the dynamic exponential moving average framework. By combining multi-head attention, spatial distance encoding, and adaptive exponential smoothing, the improved design allowed the model to better capture long-range dependencies and spatial context among safflowers. Meanwhile, to minimize computational cost while preserving inference quality, a structured pruning approach was adopted, which selectively removed redundant connections in the long short-term memory gates and fully connected layers. In parallel, the critic network was redesigned to improve learning stability and accuracy. This was achieved through the inclusion of batch normalization, residual feature aggregation, and a multi-layer value estimation head, all of which contributed to a tighter actor-critic synergy during policy training. [Results and Discussions] To quantitatively assess the impact of each component, ablation experiments were conducted across various configurations. The results confirmed that each module contributed distinct benefits, while their combination yielded the highest improvements in both planning precision and inference efficiency. This coordinated actor-critic design effectively enhanced both trajectory quality and decision stability, which were critical in sequential robotic picking tasks. Experimental results also demonstrated that, compared with traditional swarm intelligence algorithms particle swarm optimization (PSO), ant colony optimization (ACO), and non-dominated sorting genetic algorithm, the proposed AC-RL-PtrNet model achieved a planning time improvement ranging from -2.63% to 61.87% on the 25-target dataset and from 22.93% to 59.1% on the 31-target dataset. Meanwhile, the optimized paths were significantly shortened across different planning instances, indicating robust generalization capability under varied problem scales. Furthermore, field experiments provided concrete validation of the model's practical applicability. When deployed on a mobile picking robot in real safflower fields, the AC-RL-PtrNet achieved a 9.56% reduction in path length and 5.43% time saved for a 25-target picking task, and a 20.17% path reduction and 29.70% time saving for a 31-target scenario involving a different safflower variety. Overall, these results all indicated that the proposed method exhibited significant advantages in enhancing path planning efficiency and optimizing path quality. [Conclusions] This study offers a practical solution for achieving efficient and robust automatic picking by safflower picking robots and provides new insights into solving 3D combinatorial optimization problems.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

AgriAgent: End-to-End Large Model Agent System Architecture for Agricultural Environment Control

QIU Jiaying, LIU Yingchang, GAO Xingjie, HUANG Yuan, ZHANG Hongyu, TIAN Fang, LI Wanli, FENG Zaiwen

Smart Agriculture 2026, 8 (2): 220-236. DOI: 10.12133/j.smartag.SA202507042

Abstract （383）

HTML （34）

PDF（pc）（2326KB）（53）

Save

[Objective] Large language models (LLMs) have demonstrated strong capabilities in natural language understanding, knowledge integration, and complex reasoning, offering new opportunities for intelligent decision-making in agriculture. However, their direct application in agricultural production and facility environment control remains challenging due to strong physical constraints and high operational risks. The lack of real-world interaction and executable decision grounding limits the practical effectiveness of conventional LLMs in such scenarios. To address these challenges, a tool-augmented LLM-based agricultural intelligent agent system, termed AgriAgent, was proposed, and a digital-twin-based evaluation platform for agricultural decision-making was developed. By integrating a high-fidelity digital twin environment with an end-to-end agent architecture, the decision-making performance of agricultural intelligent agents with different parameter scales was systematically evaluated across multiple crops and climate scenarios. [Methods] A high-fidelity agricultural digital twin evaluation platform was constructed using the decision support system for agrotechnology transfer (DSSAT) v4.8 crop growth model as the core simulation engine to model crop growth under diverse environmental conditions and management strategies. Meteorological driving data were obtained from the Seoul Historical Weather Data dataset. Through data cleaning, missing-value imputation, unit normalization, and time-series reconstruction, the raw meteorological data were transformed into standardized inputs compatible with DSSAT. Three climate scenarios representing different environmental complexities were designed, including a regular scenario, a perturbed scenario, and an extreme scenario. The regular scenario employed historical observations, the perturbed scenario introduced stochastic disturbances to simulate short-term climate variability, and the extreme scenario incorporated multi-factor coupled stresses such as high temperatures and excessive precipitation during sensitive growth stages. In total, 90 annual climate driving sequences were generated. Fixed soil profile parameters calibrated by domain experts were applied across all simulations to minimize confounding effects. Within this digital twin environment, a tool-augmented agricultural intelligent agent, AgriAgent, was implemented using a modular architecture consisting of a sensor module, memory module, retriever, large language model, and tool executor, forming a closed-loop decision-making framework. In each decision cycle, the agent perceived environmental and crop state information, including soil moisture and nutrient status, meteorological conditions, crop growth stages, and stress indicators. State summaries and historical decisions were stored in memory, while agronomic knowledge was retrieved through a retrieval-augmented generation mechanism. Based on integrated information, the LLM generated structured environmental control commands in JSON format, which were validated and constrained by the tool executor before updating the DSSAT environment. The system supported irrigation, supplementary lighting, ventilation, heating, fertilization, and CO₂ enrichment. Five representative crops: maize, millet, sugar beet, tomato, and cabbage were simulated under the three climate scenarios over complete growing seasons, resulting in 450 crop-scenario combinations. An unmanaged DSSAT simulation served as the baseline. AgriAgent models with three parameter scales (1.5B, 3B, and 7B), built on the Qwen2.5 series, were evaluated. Crop economic yield expressed as dry matter at physiological maturity was adopted as the evaluation metric. [Results and Discussions] The results showed that AgriAgent consistently outperformed the baseline across all crops and climate scenarios, with model scale exerting a significant influence on decision-making performance. AgriAgent-7B achieved the best overall performance under regular, perturbed, and extreme scenarios, demonstrating strong generalization ability and environmental adaptability. By dynamically adjusting water, nutrient, light, and thermal management strategies, the agent effectively mitigated environmental stresses even under multi-factor coupled extreme climate conditions. Under extreme scenarios, AgriAgent-7B increased yields by 463.60% for maize, 351.20% for millet, 125.40% for sugar beet, 1 537.46% for tomato, and 1 185.14% for cabbage compared with the baseline. Particularly large gains were observed for high-value crops such as tomato and cabbage, highlighting the advantages of the proposed framework for precision-controlled facility agriculture. In contrast, AgriAgent-1.5B exhibited performance comparable to the baseline, while AgriAgent-3B achieved moderate improvements but remained inferior to the 7B model. These findings indicate a clear scaling effect, suggesting that larger models possess stronger capabilities in multi-source information integration, long-term temporal reasoning, and adaptation to complex environments. [Conclusions] This study developed a digital-twin-based agricultural decision evaluation platform and proposed a tool-augmented, end-to-end agricultural intelligent agent named AgriAgent. Experiments across multiple crops and climate scenarios verified the effectiveness and robustness of the proposed framework for dynamic agricultural decision-making. The results demonstrate that integrating knowledge retrieval, reasoning, and tool execution within a closed-loop LLM-based agent enables stable, reliable, and adaptive environmental control, providing a feasible technical pathway and standardized evaluation paradigm for intelligent agriculture.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Vegetable IoT Blockchain Anti Counterfeiting Traceability System Based on PQ-ECIES

QI Peiyang, SUN Chuanheng, TAN Changwei, WANG Jun, LUO Na, XING Bin

Smart Agriculture 2026, 8 (2): 237-250. DOI: 10.12133/j.smartag.SA202507019

Abstract （523）

HTML （18）

PDF（pc）（2154KB）（24）

Save

[Objective] The vegetable supply chain is characterized by multiple production entities, diverse product varieties, and complex circulation processes, which often result in low data accuracy, label forgery, data tampering, and difficulties in cross-enterprise collaboration in traditional traceability systems. Furthermore, the rapid development of quantum computing poses significant threats to existing cryptographic foundations by enabling efficient factorization or discrete logarithm attacks. This study aimed to design and implement a vegetable supply chain anti-counterfeiting and traceability system that integrates the Internet of Things (IoT), blockchain technology, and a post-quantum enhanced elliptic curve integrated encryption scheme (PQ-ECIES). The system seeks to enhance the trustworthiness, privacy protection, and collaborative efficiency of supply chain data management, while maintaining practical performance for IoT devices and high-frequency data uploading scenarios. [Methods] The proposed system was constructed on an IoT framework incorporating nine categories of devices. A registration and admission mechanism was developed to establish a trusted mapping between "device–enterprise–data", effectively preventing unauthorized entities from uploading forged data. At the data layer, collected information was divided into public and private categories: Public data were uploaded directly to the blockchain, while private data were encrypted using PQ-ECIES before being stored on-chain. Smart contracts automated processes such as data classification, permission verification, and encrypted data querying, thus reducing human intervention and ensuring compliance. PQ-ECIES was designed by combining elliptic curve cryptography (ECC) and the Kyber algorithm from lattice-based post-quantum cryptography. A dual-key mechanism was employed to generate session keys, where an ECC-derived shared secret was combined with a Kyber-derived shared secret through SHA3-256 hashing, followed by key derivation for encryption and authentication. This design provided resilience against Shor's algorithm and other quantum attacks while maintaining efficiency compatible with IoT devices. The blockchain system was implemented using Hyperledger Fabric 1.4.4, with seven organizational nodes and the Raft consensus mechanism. Performance testing included evaluations of data collection accuracy, on-chain latency, query latency, and encryption performance across RSA, advanced encryption standard (AES), and PQ-ECIES. [Results and Discussions] The IoT-based data collection achieved significantly higher accuracy than manual input, particularly in large-scale sample scenarios such as pesticide residue testing. The average latency for data uploading to the blockchain was 2 879 ms, while data query latency averaged 122 ms, both of which met the practical requirements of vegetable supply chain applications. In cryptographic performance testing, PQ-ECIES achieved encryption and decryption of 128 B plaintext in approximately 10－30 ms, outperforming RSA (50－80 ms) and only slightly slower than AES (<10 ms). This result indicates that PQ-ECIES achieved an optimal trade-off between efficiency and security, offering asymmetric encryption benefits such as key distribution and identity verification, along with strong post-quantum resistance. Simulation under quantum attack models confirmed that traditional ECC and AES could be compromised within hours using Shor's and Grover's algorithms, whereas PQ-ECIES maintained resilience due to the lattice-based hardness assumptions of Kyber. From a system-level perspective, three major contributions were identified. First, trustworthiness was enhanced by binding IoT devices to enterprises through Bluetooth-based verification and blockchain's immutable ledger, ensuring data authenticity at the source. Second, privacy protection was achieved by adopting graded visibility: Consumers accessed only public data such as testing results and logistics status, while regulators could decrypt private information (e.g., production location and batch details) via authorized keys, balancing transparency with confidentiality. Third, collaboration across enterprises was improved through the consortium blockchain structure and Fabric channel mechanisms, which eliminated information silos and enabled selective data sharing in real time, reducing inter-organizational access time from weeks to minutes. Experimental validation confirmed that IoT-based collection significantly improved accuracy, blockchain integration achieved acceptable on-chain and query latency, and PQ-ECIES outperformed RSA while offering post-quantum resistance not available in AES. [Conclusions] This study proposed and implemented a vegetable supply chain traceability system that integrates IoT, blockchain, and PQ-ECIES. By deploying nine categories of IoT devices, establishing trusted device-enterprise mappings, and incorporating blockchain's decentralized and tamper-proof ledger, the system ensured reliable data collection and storage. The integration of PQ-ECIES provided dual cryptographic protection, balancing efficiency with long-term quantum security. Beyond technical performance, the system enhanced trust, privacy, and collaboration across the vegetable supply chain, effectively addressing common issues of data forgery, tampering, and cross-enterprise coordination.Overall, the proposed framework demonstrates high potential for real-world deployment in agricultural supply chains, offering a secure, efficient, and future-proof solution to ensure authenticity, reliability, and transparency in vegetable traceability. The study also provides a reference model for extending post-quantum blockchain-based traceability to other agri-food sectors facing similar challenges.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Path Planning Algorithm for an Eel Feeding Robotic Arm Based on Improved BI-RRT

MA Mengxian, XU Zhen, YUAN Quan, ZHOU Wenzong, ZHANG Chunyan

Smart Agriculture 2026, 8 (2): 251-264. DOI: 10.12133/j.smartag.SA202509020

Abstract （310）

HTML （15）

PDF（pc）（4047KB）（20）

Save

[Objective] In the eel (Monopterus albus) farming system used in feed distribution research of mechanical arm, the challenges included slow path planning speeds, excessive trajectory redundancy, and suboptimal obstacle avoidance success rates within confined operational spaces. To mitigate these issues, an improved path planning algorithm, based on the bidirectional rapidly-exploring random tree star (BI-RRT*) algorithm was proposed. The primary aim was to significantly enhance the motion efficiency and task success rate of robotic arms operating in complex, constrained environments. [Methods] The proposed improved BI-RRT* algorithm integrated an adaptive goal-biased strategy with an enhanced artificial potential field (APF) method. The algorithm's framework comprises three core components: a high-quality sampling strategy, an efficient search strategy, and a path optimization algorithm. For the high-quality sampling strategy, an adaptive goal-biased approach was introduced to overcome the limitations of inefficient random sampling and slow convergence rates characteristic of traditional BI-RRT algorithms in complex environments. This strategy dynamically adjusted the generation of sampling points, moving beyond purely random selection. Instead, it prioritized sampling regions in the vicinity of the target, guided by the target direction and a predefined bias probability. This mechanism substantially augmented the growth propensity of the search tree towards the target area, effectively reducing the stochasticity of random sampling and consequently accelerating the path search process. To enhance search efficiency and prevent the algorithm from converging to local optima, an improved APF was incorporated into the node expansion process. The APF was refined to achieve superior integration with the BI-RRT framework. During each new node expansion, in addition to considering the inherent random exploration characteristics of BI-RRT, a directional attractive field was superimposed. This attractive field not only originated from the ultimate target point but also factored in the current growth orientation of the search tree and localized environmental information. Specifically, a composite attractive function was devised, which synergized the attractive force exerted by the target point on the current node with the attraction from potential "guide points". Concurrently, the computation of the repulsive field was optimized to more precisely delineate the geometry and proximity of obstacles, thereby circumventing common issues such as "oscillation" and "deadlock" prevalent in traditional APF. Through this methodology, the algorithm was able to more effectively steer the search tree to circumvent obstacles and rapidly converge towards the target region, significantly bolstering the directness of the search and successfully preventing the algorithm from becoming ensnared in suboptimal local solutions. For the path optimization algorithm, following the generation of an initial feasible path, a greedy optimization strategy was employed for path pruning and smoothing. This was executed to yield an optimal path characterized by reduced length, enhanced smoothness, and improved conformity with the kinematic properties of the robotic arm. Path pruning was initially applied to eliminate redundant nodes; if a collision-free direct connection existed between two non-adjacent nodes, intermediate nodes were excised, thereby substantially abbreviating the path length. Subsequently, path smoothing techniques, such as B-spline curves or cubic spline interpolation, were introduced to enable the robotic arm to execute movements with greater stability and efficiency during actual operation, mitigating impact and vibration. This two-stage optimization procedure ensured that the final generated path was not merely feasible but also optimal across metrics of length, smoothness, and motion efficiency. [Results and Discussions] To comprehensively validate the performance of the proposed algorithm, a two-stage experimental verification was conducted. Initially, comparative simulations were performed in both two-dimensional (2D) and three-dimensional (3D) environments utilizing the Matlab platform. These simulation scenarios were meticulously engineered to encompass three archetypal environments—simple, complex, and narrow passages—thereby emulating the diverse obstacle configurations potentially encountered in industrialized eel aquaculture. The results demonstrated that, concerning both path planning speed and quality, the improved BI-RRT* algorithm significantly surpassed RRT, APF-RRT*, and traditional BI-RRT* algorithms across all tested environments, substantiatingthe theoretical superiority and inherent robustness of the improved BI-RRT* algorithm proposed in this study across varying complex environments. To further ascertain the engineering applicability and practical potential of the algorithm, an eel feeding robotic arm simulation system was meticulously constructed based on the robot operating system and MoveIt frameworks. This system precisely emulated the kinematics, dynamics, and obstacle distribution pertinent to an industrialized eel aquaculture environment. During simulated continuous feeding tasks, the improved BI-RRT* algorithm consistently exhibited impressive and outstanding performance. Its average running time was merely 2.1 s, representing a substantial 41.6% reduction compared to the traditional BI-RRT*. The average length of the planned path was recorded at only 1 680 mm, with an average of 180 nodes, indicating a significant reduction in path redundancy. Furthermore, the algorithm achieved an impressive obstacle avoidance success rate of 96% in complex confined spaces. These empirical findings not only validated the algorithm's effectiveness but also underscored its immense potential for practical engineering applications. [Conclusions] The experimental results conclusively demonstrated that the improved BI-RRT* algorithm significantly enhanced the path planning efficiency and trajectory quality of robotic arms operating within confined spaces. It also exhibited exceptionally high reliability in obstacle avoidance, thereby effectively addressing the automated feeding requirements of industrialized eel aquaculture. The algorithmic framework possessed considerable generality, offering valuable theoretical insights and technical precedents for resolving analogous robotic arm path planning challenges in other agricultural automation contexts.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Select

Key Factor Extraction Method of Agricultural User Demand Based on Large Language Models

LI Runteng, WANG Yiqun, LI Hongda, LI Jingchen, CHEN Wenbai

Smart Agriculture 2026, 8 (2): 265-278. DOI: 10.12133/j.smartag.SA202509011

Abstract （291）

HTML （17）

PDF（pc）（3931KB）（42）

Save

[Objective] In the agricultural domain, user demand texts serve as essential primary sources for agricultural extension, production management, and policy services. However, these texts typically contain highly specialized terminology, exhibit non-standard, colloquial, and diverse linguistic expressions, present fragmented semantics, and rely heavily on contextual reasoning. Such characteristics make them difficult to parse accurately using traditional rule-based approaches or shallow machine learning models. Consequently, these limitations often lead to biased demand classification and incomplete extraction of key factors, thereby constraining the quality of data available for intelligent agricultural decision-making. To address these challenges, the aim of this research is to develop a robust, domain-adapted, and highly interpretable structured analysis method for agricultural user demands. [Methods] Agri-NeedAgent, an agricultural user demand analysis framework, was proposed based on a "three-stage training + multi-agent collaboration" paradigm. First, during the domain knowledge pretraining stage, 80 000 agriculture-related texts, including crop cultivation manuals, pest and disease control guides, agricultural policy documents, and farmer consultation records, were used to construct domain-specific semantic understanding, thereby enhancing the model's capability to interpret agricultural terminology, dialectal expressions, contextual logic, and implicit semantics. Second, in the instruction fine-tuning stage, 6 320 annotated samples in an "instruction-input-output" format were employed to establish an explicit mapping from raw demand texts to structured outputs. Third, in the agricultural knowledge low-rank adaptation stage, Low-rank Adaptation (LoRA) was applied to perform lightweight parameter tuning on task-specific agents, enabling targeted adaptation for demand classification and key-factor extraction tasks. Built upon the above training process, a multi-agent collaborative framework was constructed, in which the manager agent was responsible for task scheduling and quality control, while task agents were designed to perform demand classification, key-factor extraction, and explanation generation, respectively. Through this division of labor and collaborative mechanism, the framework achieved efficient and structured analysis of agricultural user demands. [Results and Discussions] Experimental results demonstrate that the proposed Agri-NeedAgent achieved a demand classification accuracy of 84.6%, a key-factor extraction F₁-Score of 85.2%, a structured interface compliance rate of 94.2%, and an interpretability score of 90.2.These results showed clear improvements over traditional deep learning models such as Bidirectional Encoder Representations from Transformers (BERT) as well as general-purpose large language models (LLMs) without domain adaptation. The findings confirmed the critical role of domain knowledge injection, explicit task alignment, and multi-agent specialization in enhancing semantic understanding and structured analysis of agricultural texts. Ablation experiments further validated the effectiveness of each component. Removing domain pretraining or LoRA fine-tuning resulted in substantial performance degradation in classification and key-factor extraction, indicating the necessity of domain adaptation and task-specific optimization for handling non-standard agricultural expressions. Moreover, eliminating the manager agent or the Reasoning and Acting (ReAct) mechanism significantly reduced structured interface compliance and interpretability, highlighting the importance of task coordination, intermediate verification, and multi-step reasoning for ensuring logical consistency and output completeness. Additionally, removing the external knowledge base reduced the interpretability score from 90.2 to 77.6, underscoring its essential role in providing theoretical grounding, reasoning support, and professional explanations. Although the multi-agent collaboration introduced an additional inference overhead of approximately 140 ms, the overall per-sample inference time remained within 225 ms, meeting the real-time requirements of agricultural consultation scenarios. [Conclusions] Supported by a "three-stage training + multi-agent collaboration" framework, LLMs can effectively address challenges posed by non-standard expressions, semantic fragmentation, and multi-factor reasoning in agricultural user demand texts. The proposed method demonstrated significant improvements in demand classification, key-factor extraction, structured output compliance, and interpretability, providing high-quality and traceable structured data for intelligent agricultural decision-making. After domain adaptation and task-specific tuning, the model not only gains enhanced capability for deep semantic analysis of agricultural user demands but also ensures the completeness and interpretability of outputs through multi-agent coordination. Although the current workflow still requires optimization in terms of data preparation, staged training, and knowledge-base updating, future work will focus on expanding region-specific and emerging-technology-related demand data, developing a dynamically updated agricultural knowledge system, improving multi-agent coordination efficiency, and exploring cross-lingual agricultural demand analysis to further promote the application and deployment of agricultural large models across broader scenarios.

Table and Figures | Reference | Related Articles | Metrics | Comments（0）

Archive By Volume