Smart Agriculture

Earth Observation-Driven Digital (Smart) Agriculture: Research Frontiers and Application Cases |

WU Bingfang, MA Hui, ZHANG Miao, PAN Qingcheng, ZHANG Xiang, CHEN Shuisen, QIU Bingwen, XU Xingang, LIU Jianhong, FAN Jinlong, HUANG Jianxi, JIANG Jiale, HE Changchui

2026, 8(2): 1-17. doi:10.12133/j.smartag.SA202512027

Asbtract ( 1125 )

HTML ( 36)

PDF (2369KB) ( 58 )

Figures and Tables | References | Related Articles | Metrics

[Significance] Digital agriculture is unequivocally the core driving force for modern agricultural transformation, fundamentally aiming to achieve full-process digital mapping and intelligent management of production through the deep integration of advanced information technologies such as the Internet of Things, big data, artificial intelligence (AI), and remote sensing, with earth observation (EO) technology serving as the essential data engine providing indispensable spatial information support for this systemic shift. However, the current landscape of digital agriculture development remains unbalanced, exhibiting a tendency to be "heavy on transactions and light on production", where the core production links suffer from low digitalization penetration rates; furthermore, the profound knowledge embedded within the vast corpus of EO data has yet to be fully extracted and interpreted, leading to a situation where many established algorithms demonstrate insufficient robustness and universality when confronted with the complexity and diversity of global cropping systems, thereby limiting their practical efficacy. Crucially, an over-reliance on technology to optimize production efficiency alone, without ecological guidance, can induce secondary environmental risks, such as exacerbating regional groundwater depletion or contributing to a decline in biodiversity through agricultural landscape simplification, thus necessitating an approach that promotes the deep coupling of EO technology with agronomic principles and local ecological practices to construct a resilient smart agricultural system that achieves a holistic balance between productivity, resource efficiency, and ecological integrity. [Progress] The current research frontiers of EO-driven digital agriculture primarily converge on three critical domains: intelligent crop condition monitoring, digital twin farming systems, and the enhancement of agricultural system resilience. Intelligent monitoring utilizes the fusion of high-resolution remote sensing imagery and machine learning frameworks to enable large-scale, comprehensive crop mapping and the fine-grained identification of crop types at the field scale, with next-generation yield prediction models integrating advanced deep learning techniques to significantly improve accuracy, while remote sensing is also effectively employed for agricultural disaster monitoring. The digital twin farming system represents an advanced stage of precision agriculture, centered on digitally modeling all agricultural production elements to construct a highly consistent virtual replica of the physical environment, operating through a real-time closed-loop mechanism of perception, simulation and analysis, and decision-making support to guide optimal interventions; successful applications include intelligent water resource scheduling in Chinese irrigation districts and the use of AI vision algorithms to manage complex biological processes like crab farming, although the field must overcome the issue of "pseudo-twins" that focuses on mere visualization rather than driving concrete operational decisions. The focus on agricultural system resilience is supported by digital agriculture providing crucial spatial data on global crop yields, cultivated land distribution, and practices like terracing. To illustrate the practical efficacy of these technologies, this paper analyzes two representative application cases. First, the CropWatch system represents a paradigm shift in agricultural monitoring by constructing a "Cloud-Edge" collaborative ecosystem. It integrates machine learning with a "Pre-training, Prompting, and Fine-tuning" large language model (LLM) framework to automate remote sensing-based crop monitoring, report generation and enhance decision-support intelligence. Through open application programming interfaces (APIs) and multi-scale capabilities, CropWatch provides cross-scale information and decision support from macro-level policy support to micro-level farm management, serving as a global public good that bridges the digital divide in developing nations. Second, in the domain of agricultural water management, the ETWatch technical system demonstrates a robust solution for the precise governance of water resources. By achieving high-resolution evapotranspiration (ET) monitoring from basin to field scales, it enables the accurate assessment of water productivity and the optimization of irrigation schedules. Crucially, this technology is successfully embedded into institutional mechanisms, such as water rights allocation and tiered pricing based on actual consumption, thereby realizing a transformation from empirical water use to data-driven, precise regulation. [Conclusions and Prospects] In sum, digital (smart) agriculture is rapidly transcending its role as a mere extension of agricultural informatization to become the "new-quality productivity" driving high-quality agricultural development, achieving this by fundamentally restructuring production factors, enhancing resource efficiency, strengthening risk response capabilities, and promoting value chain upgrading, thereby offering critical momentum for constructing a more efficient, greener, and sustainable modern agricultural system. Given China's pronounced global advantages in the digital economy, information technology, remote sensing, and intelligent equipment, the nation is well-positioned to integrate these strengths to construct comprehensive, full-chain smart agricultural solutions whose mature systemic models and business paradigms can ultimately form a "China Card" in the global agricultural revolution, contributing Chinese wisdom and solutions towards the realization of global food security and the zero-hunger goal.

High Spatiotemporal Resolution Remote Sensing for Precision Agricultural Disaster Early Warning: Progress, Bottlenecks, and Integrative Pathways |

XU Xiaobin, ZHU Hongchun, LI Feng, HE Wei, YANG Jiaming, LI Zhenhai

2026, 8(2): 18-34. doi:10.12133/j.smartag.SA202512002

Asbtract ( 795 )

HTML ( 23)

PDF (1228KB) ( 46 )

Figures and Tables | References | Related Articles | Metrics

[Significance] Under climate change, the frequency and intensity of extreme weather events have increased markedly, posing persistent threats to global food security. Agricultural meteorological disasters, including droughts, floods, heat stress, frost damage, and mechanically induced events such as lodging and hail, are increasingly characterized by rapid onset, strong spatial heterogeneity, and compound interactions. Conventional management strategies relying mainly on post-event assessment are insufficient for timely warning and precision intervention. The development of high spatiotemporal resolution remote sensing and integrated observation systems combining satellite, unmanned aerial vehicle (UAV), and ground-based sensing has substantially advanced agricultural disaster monitoring. These technologies enable field-scale characterization of spatial variability and detection of short-duration disaster processes at hourly to daily timescales. This review synthesizes recent progress in sky-air-ground integrated remote sensing for agricultural meteorological disaster management and establishes a unified framework linking monitoring, early warning, and decision-making, with emphasis on hydrological stress, thermal stress, and structural damage. [Progress] At the observation level, a multi-tier sensing architecture has emerged. Satellite remote sensing provides broad coverage and regular revisit cycles, forming the backbone of regional monitoring. Optical sensors support retrieval of crop structural and biochemical parameters, thermal infrared data enable canopy temperature and evapotranspiration estimation, and synthetic aperture radar (SAR) offers all-weather capability for soil moisture and flood detection. Solar-induced chlorophyll fluorescence (SIF) provides direct information on crop photosynthetic function and enables early identification of physiological stress. UAV platforms complement satellites through flexible deployment and centimeter-scale resolution, allowing detailed mapping of canopy temperature and three-dimensional crop structure using multispectral, thermal, and light detection and ranging (LiDAR) sensors. Ground-based meteorological stations and sensor networks provide continuous measurements for calibration and validation, although scaling point observations to spatially continuous products remains challenging. Consequently, multi-sensor integration is evolving from data stacking toward physically complementary constraint frameworks. Methodologically, two dominant approaches of physically based inversion and data-driven recognition are used. Radiative transfer models, surface energy balance methods, and SAR scattering models offer strong physical interpretability but depend on prior information and data quality. Machine learning and deep learning methods effectively capture nonlinear relationships and complex spatial patterns for disaster identification, yet remain limited by interpretability and cross-regional generalization. At the early-warning stage, crop growth models, hydrological models, and spatiotemporal prediction networks are applied to simulate disaster evolution. Hybrid models embedding physical constraints into data-driven frameworks have become a key research direction to enhance predictive robustness. Decision-support systems have expanded from threshold-based rule engines toward optimization algorithms and multi-objective frameworks, enabling warning information to be translated into actionable irrigation scheduling, protective measures, and emergency responses. Regarding specific hazards, drought monitoring has shifted from vegetation indices toward coupling root-zone soil moisture with crop physiological responses, with SIF-based indicators showing strong potential for early stress detection. Flood studies rely primarily on SAR-based inundation mapping and extend toward quantitative damage assessment. Heat and frost stress research emphasizes growth-stage-dependent dynamic thresholds. Lodging monitoring integrates structural parameters derived from optical, LiDAR, and SAR data, while hail-related studies focus on rapid post-event damage mapping. Compound and cascading disasters have become an important research frontier. [Conclusions and Prospects] High spatiotemporal resolution remote sensing has greatly enhanced the observability and early-warning potential of agricultural meteorological disasters. Nevertheless, key challenges remain, including heterogeneous data integration, scale inconsistency, uncertainty propagation, and insufficient coupling among monitoring, warning, and decision-making components. Future progress requires a systems-engineering perspective. Physically guided machine learning can bridge mechanistic understanding and data adaptability, while agricultural disaster digital twins provide a framework for dynamic interaction among observation, simulation, and decision optimization. In parallel, multi-factor time-series risk modeling and multi-agent learning are needed to better represent compound disaster processes and support intelligent, adaptive, and precision-oriented agricultural disaster management systems.

Lodging Region Detection Method in Flax Based on Lightweight Improved YOLOv11n-seg Model |

SU Yujie, LI Yue, WEI Linjing, WU Bing, GUO Linhai, YAN Bin, ZHOU Hui, GAO Yuhong, KANG Lianghe, LIU Huan, SU Shunchang

2026, 8(2): 35-47. doi:10.12133/j.smartag.SA202508013

Asbtract ( 1149 )

HTML ( 25)

PDF (5158KB) ( 48 )

Figures and Tables | References | Related Articles | Metrics

[Objective] Lodging is a major agronomic constraint that adversely affects both yield and quality in field crops, with flax (Linum usitatissimum L.) being especially vulnerable due to its slender stems and susceptibility to wind and rainfall. Precise delineation of lodged areas from field imagery remains a significant challenge owing to the complex and heterogeneous morphology of lodging patterns, irregular and blurred boundaries, and substantial background interference from upright plants, weeds, and soil textures. These factors necessitate the development of a segmentation framework that combines high precision and strong boundary adherence with computational efficiency, enabling deployment on resource-constrained agricultural monitoring platforms. In response to this need, a lightweight accurate lodging segmentation approach based on improved YOLOv11n-seg architecture was proposed to enhance fine-grained feature sensitivity, multi-scale representation capability, and boundary precision, while markedly reducing parameter count, giga floating-point operations (GFLOPs), and model size. [Methods] The proposed architecture integrated targeted modifications across the backbone, neck, and output stages. In the backbone, standard C3k2 modules were replaced with C3k2_SDW blocks, which combined a StarBlock structure with depthwise separable convolutions to reduce redundancy and computation without sacrificing spatial and contextual representational capacity. To counteract potential reductions in channel discrimination resulting from light-weighting, a multi-scale efficient channel attention (MS-ECA) mechanism was embedded within selected backbone layers, yielding C3k2_SDW_MS-ECA modules. These modules incorporated parallel convolution branches with varying kernel sizes to capture channel-wise dependencies across multiple receptive fields, thereby adaptively recalibrating lodging-related features with minimal computational overhead. In the neck, a bidirectional feature pyramid network (BiFPN) was introduced to facilitate efficient bidirectional information exchange between scales. By assigning normalized, trainable fusion weights, the BiFPN adaptively balanced contributions from low- and high-level feature maps, while a multi-stage semantic fusion strategy further enriched the integration of spatial details and contextual semantics, thereby improving the detection of small and fragmented lodged patches. At the output stage, a boundary refinement procedure was applied to the predicted masks, improving contour sharpness, enhancing boundary compactness, and mitigating false detections in complex visual environments.The experimental dataset comprised unmanned aerial vehicle (UAV) RGB imagery at a resolution of 4 032×2 268 pixels, acquired from flax fields in Dingxi, Gansu province. Lodged regions were manually annotated with polygonal masks. To increase robustness against variability in illumination, background complexity, and lodging morphology, data augmentation techniques, including random rotation, brightness and contrast adjustment, and blurring were employed, expanding the dataset to 3 852 images. The dataset was divided into training, validation, and testing subsets in a 75%, 15% and 10% split. Model training was conducted with 640×640 pixel inputs for 300 epochs using stochastic gradient descent (initial learning rate 0.01, momentum 0.937, weight decay 0.000 5) in PyTorch 2.0.0. Evaluation involved comparison with YOLACT, YOLOv7-seg, YOLOv8n-seg, and the original YOLOv11n-seg using precision (P), recall (R), mAP@0.5, mAP@0.5:0.95, parameter count, GFLOPs, and model size. [Results and Discussions] Ablation experiments demonstrated the incremental contributions of each architectural component. Substituting C3k2 with C3k2_SDW reduced parameters from 2.83 M to 2.14 M and computation from 10.2 to 8.1 GFLOPs, with slight performance improvements. Incorporating BiFPN further lowered complexity to 1.68 M parameters and 7.7 GFLOPs, accompanied by notable gains in detection metrics. The addition of MS-ECA attention achieved the highest performance, delivering P of 92.6%, R of 92.0%, and mAP@0.5 of 95.2%, corresponding to improvements of 3.7 percentage points in Precision and 2.1 percentage points in mAP@0.5 over the YOLOv11n-seg baseline, without increasing model size. Qualitative Grad-CAM visualizations revealed more precise focus on lodging regions and reduced false activations in upright stems and non-lodged soil areas. Generalization capability was further validated on the public WE3DS agricultural segmentation dataset, where the proposed model achieved average improvements of 4.3, 1.9, and 2.6 percentage points in precision, recall, and mAP@0.5, respectively, compared to the baseline. [Conclusions] The improved YOLOv11n-seg architecture achieves a superior balance between accuracy and efficiency for flax lodging segmentation by combining the C3k2_SDW_MS-ECA backbone, BiFPN with multi-stage semantic fusion in the neck, and output boundary refinement. This combination of high accuracy, lightweight design, and robust boundary delineation renders the model highly applicable to real-time, in-field deployment for intelligent lodging monitoring and precision agriculture. The results further suggest that the approach is transferable to broader agricultural segmentation tasks, providing a practical and scalable solution for modern smart farming applications.

Optimal Sampling Strategy for Soil Organic Matter Based on Hippopotamus Optimization Algorithm and Machine Learning |

LIAN Zhenxiang, FEI Xufeng, REN Zhouqiao

2026, 8(2): 48-58. doi:10.12133/j.smartag.SA202508027

Asbtract ( 948 )

HTML ( 12)

PDF (1923KB) ( 22 )

Figures and Tables | References | Related Articles | Metrics

[Objective] Soil quality is crucial for food security, ecosystem health, and sustainable development, but faces degradation due to intensive land use. Accurate soil quality assessment is therefore essential for informed land management and ecological protection. Machine learning has enhanced digital soil mapping (DSM) by improving modeling accuracy through multi-source data integration. Within DSM, soil sampling design is a foundational step that directly influences prediction accuracy, cost, and efficiency. An ideal scheme must balance mapping precision with economic and operational feasibility. This study focuses on soil organic matter (SOM), a core indicator of soil quality affecting fertility, carbon sequestration, and environmental regulation. Precisely mapping its spatial variability is vital for sustainable soil management. To address the need for efficient sampling, the aim of this research is to develop an optimal sampling design method for regional-scale SOM mapping, reduce sampling redundancy and cost while improving spatial prediction accuracy. [Methods] A sampling optimization framework was proposed that integrated intelligent optimization algorithms with a hybrid spatial interpolation model. The framework was built upon the hippopotamus optimization algorithm (HO) and incorporated the random forest residual kriging (RFRK) method to construct an optimal sampling strategy for the spatial prediction of SOM. At the initialization stage, a population of candidate solutions, referred to as "hippopotamuses", was randomly generated, with each individual representing a potential sampling layout. The HO was employed to select subsets of sampling points from the training sample pool, with each subset forming a candidate solution. Collectively, these solutions constituted the initial hippopotamus population. The study area was located in Lanxi city, Zhejiang province, where a total of 1 080 field-measured soil samples were collected. These samples were partitioned into a training set (n=756), a validation set (n=108), and a test set (n=216) at a ratio of 7:1:2. Environmental covariates, including terrain attributes, vegetation indices, and climate factors, were extracted from multi-source remote sensing datasets. Using these covariates, the HO optimized sampling schemes across varying densities and spatial configurations. The resulting designs were then evaluated using the RFRK model to assess their SOM prediction performance. This process enabled the identification of the optimal sampling density and spatial layout that balanced accuracy and cost-efficiency. [Results and Discussions] When the HO-RFRK framework was applied, the prediction accuracy of SOM improved significantly as sampling density increased from 0.5 to 2.3 points/km² (136-629 points). The root mean square error (RMSE) on the test set decreased from 6.04 to 5.11 g/kg, representing a reduction of approximately 15.4%. The lowest prediction errors were observed at a sampling density of 2.3 points/km², with the RMSE and mean absolute error (MAE) reaching their minimum values of 5.11 and 3.79 g/kg, respectively, beyond which further increases yielded only marginal gains, indicating diminishing returns. To assess the effectiveness of HO, its performance was compared with three established methods: conditioned Latin hypercube sampling (cLHS), genetic algorithm (GA), and particle swarm optimization (PSO). At lower densities (0.5－1.3 points/km²), all methods showed limited predictive power. However, at 1.4 points/km² (383 points), the HO method was the first to exceed predefined accuracy thresholds (coefficient of determination, R²>0.40; Lin's concordance correlation coefficient, LCCC>0.55), achieving R²=0.41 and LCCC=0.57, outperforming cLHS (R²=0.38, LCCC=0.53), GA (R²=0.39, LCCC=0.52), and PSO (R²=0.38, LCCC=0.51). Across the range of 1.4－2.3 points/km², HO consistently delivered superior results. At 2.3 points/km², the HO-RFRK combination achieved R²=0.49 and LCCC=0.63, surpassing cLHS, GA, and PSO in both metrics. [Conclusions] Based on the cultivated land of Lanxi city as a test case, a novel sampling optimization strategy was proposed based on the HO. First, the strategy successfully identified an optimal sampling density that maximizes prediction accuracy, as well as a lower, cost-effective density that maintains robust predictive performance with substantially reduced survey costs, defining a practical density range that balances precision and economic feasibility. Second, the RFRK model consistently demonstrated superior prediction accuracy compared to the standard random forest (RF) model across all tested sampling schemes, validating the effectiveness of the integrated HO-RFRK approach. In summary, this optimized strategy achieves high mapping accuracy with greater sampling efficiency, offering a scientifically grounded and practical methodology for reducing long-term soil monitoring costs. It provides a valuable reference for optimizing soil surveys in Lanxi city and other regions with similar environmental settings.

YOLOv8n-SSND: An Improved Lightweight Model for Aerial Chenopodium Chenopodium quinoa Willd. Spike Target |

WU Tingting, GUO Junrui, TAO Qiujie, CHEN Shihua, GUO Shanli

2026, 8(2): 59-71. doi:10.12133/j.smartag.SA202508021

Asbtract ( 1009 )

HTML ( 18)

PDF (3353KB) ( 32 )

Figures and Tables | References | Related Articles | Metrics

[Objective] The Chenopodium quinoa panicle is a critical phenotypic indicator for estimating crop yield and evaluating the growth condition of Chenopodium quinoa plants. Accurate and efficient recognition of Chenopodium quinoa panicles in complex field environments is therefore of great significance for intelligent agriculture, yield prediction, and automatic crop management. However, unmanned aerial vehicle (UAV)-acquired field imagery often exhibits complex characteristics such as diverse panicle morphology, uneven illumination, overlapping occlusion, and background interference, et al., posing substantial challenges for conventional target detection algorithms. To address these issues, a lightweight target detection model, named YOLOv8n-SSND (YOLOv8n with Switchable Atrous Convolution, Slim Neck, and Deformable Attention) is proposed, and specifically optimized for UAV-based Chenopodium quinoa panicle identification to improve the detection accuracy and inference efficiency for Chenopodium quinoa panicles while maintaining low computational cost and real-time performance suitable for embedded UAV deployment. [Methods] The proposed model was constructed based on the YOLOv8n and YOLOv11n frameworks, and incorporated several improvements tailored for small-object agricultural detection tasks. To enhance the ability to capture multi-scale and high-dimensional semantic features, the switchable atrous convolution (SAC) module was embedded into the backbone network. This module dynamically adjusted its receptive field according to spatial context, enabling more precise extraction of local and global texture details of Chenopodium quinoa panicles. In order to reduce redundant parameters and maintain high computational efficiency, a slim-neck lightweight feature fusion layer was designed, which effectively strengthened the integration of shallow spatial information and deep semantic features, allowing the network to maintain high accuracy without increasing model complexity. Additionally, a deformable attention (DA) mechanism was introduced to enable adaptive focus on regions with rich panicle-related features while suppressing irrelevant background noise. This attention mechanism assigned dynamic weights across both spatial and channel dimensions, improving the model's robustness against occlusions, illumination variations, and complex field textures commonly encountered in UAV images. [Results and Discussions] Comprehensive field experiments were conducted using UAV images of Chenopodium quinoa plots collected under different environmental conditions and growth stages. The results demonstrated that the proposed YOLOv8n-SSND model achieved a mean average precision (mAP50) of 94.3%, showing a remarkable improvement over multiple baseline and comparative models. Specifically, compared with YOLOv11n-SSND, YOLOv11n, YOLOv12n, YOLOv7, YOLOv5s, single shot multibox detector (SSD), fast region-based convolutional neural network (Fast R-CNN) and YOLOv8n, the proposed model achieved improvements of 0.7, 0.9, 2.1, 1.4, 2.0, 23.1, 19.6 and 1.8 percentage points respectively (SSD and Fast R-CNN). In terms of computational efficiency, the inference speed reached 166.7 f/s, representing a 26.7% increase over the YOLOv8n baseline, which ensured real-time detection capability for UAV-mounted onboard processors. Moreover, the total operation count was reduced to 6.8 GFLOPs, reflecting a 16.0% reduction compared with the baseline model, thus demonstrating the improved efficiency of the proposed architecture. The experimental comparison also indicated that the integration of SAC enhanced the model's sensitivity to complex spatial patterns, while the DA module effectively improved feature selectivity and prevented overfitting to background textures. The Slim-Neck design contributed significantly to reducing parameter redundancy and facilitated smooth feature propagation across layers. [Conclusions] The YOLOv8n-SSND model effectively achieves a balance among detection accuracy, inference speed, and computational cost, making it well-suited for real-time UAV-based agricultural monitoring. The experimental outcomes confirm that the model not only provides high-precision detection of Chenopodium quinoa panicles but also offers superior inference efficiency with minimal computational resources. These characteristics make it a promising solution for UAV-deployed intelligent agricultural systems, where power and processing capacity are limited. Furthermore, the proposed method provides a technical foundation for large-scale and automated monitoring of Chenopodium quinoa growth, enabling accurate yield estimation, phenotypic analysis, and precision crop management.

Geographically Weighted Random Forest for County-scale Digital Mapping of Soil Organic Matter: A Case Study in the Central Shandong Mountains |

ZHANG Shulin, CUI Liqin, LIU Jian, ZHANG Canting, WANG Hongjia, ZHANG Tingting, WANG Ailing

2026, 8(2): 72-85. doi:10.12133/j.smartag.SA202508020

Asbtract ( 847 )

HTML ( 17)

PDF (3174KB) ( 29 )

Figures and Tables | References | Related Articles | Metrics

[Objective] Soil organic matter (SOM) is a fundamental indicator for evaluating soil fertility and soil quality. In mountainous counties characterized by complex terrain and pronounced environmental heterogeneity, SOM exhibits strong spatial variability even over short distances, which often results in limited prediction accuracy for conventional digital soil mapping (DSM) models. With the nationwide implementation of the Third National Soil Census, the demand for high-resolution and high-accuracy SOM mapping at the county scale has become increasingly urgent. Against this backdrop, Yiyuan county in Shandong province was selected as the study area to assess the applicability of the geographically weighted random forest (GWRF) model in SOM mapping within complex terrain regions. Furthermore, it sought to systematically compare the predictive performance of GWRF with several commonly used models, thereby providing technical support for soil resource surveys, census result compilation, and county-level land management. [Methods] The dataset consisting of 1 565 measured topsoil SOM samples was utilized, along with nineteen environmental variables representing five categories: topography, climate, vegetation, soil properties, and land use. Through correlation analysis and collinearity diagnostics, twelve key variables were retained for model construction. The GWRF model, which integrates localized spatial modeling with nonlinear machine-learning capability, was developed to generate high-resolution SOM predictions across the study area. An adaptive bandwidth strategy was employed, and the optimal bandwidth of 500 was determined. Grid search combined with cross-validation was used to identify the optimal mtry value of 4 for the random forest component. In addition to GWRF, four reference models were constructed for comparison: ordinary kriging (OK), multiple linear regression (MLR), geographically weighted regression (GWR), and random forest (RF). Model performance was evaluated using two commonly adopted accuracy metrics: the coefficient of determination (R²) and root-mean-square error (RMSE). [Results and Discussions] Overall, SOM levels in Yiyuan county were relatively low, with a mean value of 15.62 g/kg. The spatial variation was moderate and exhibited a clear pattern: SOM values were higher in the central area and lower in the northeastern and southwestern areas. Considerable differences were observed in prediction accuracy among the five models. The GWRF model achieved the best overall performance, with an R² of 0.48 and an RMSE of 5.12 g/kg. This accuracy clearly surpassed that of RF (R²=0.41) and GWR (R²=0.35), and its advantage over MLR and OK was even more pronounced. A paired-sample t-test further confirmed that the accuracy improvements of GWRF over the other four models were statistically significant, supporting the robustness and reliability of the model's enhanced performance. According to the mapping results, the OK model produced an excessively smooth surface, making it difficult to reveal local details. While the MLR and GWR models could characterize certain environmental effects, they exhibited significant biases such as underestimation of high values and overestimation of low values. In contrast, the GWRF model performed prominently in capturing both global trends and local subtle variations. The analysis of variable importance showed that soil type, annual evapotranspiration, slope, and sand content were the most influential factors governing SOM distribution in the study area. Moreover, their spatially varying importance revealed notable heterogeneity. [Conclusions] This study demonstrated that the GWRF model possesses significant advantages in county-scale SOM digital mapping within mountainous areas. Its prediction accuracy markedly exceeded that of RF and conventional linear models, owing to its ability to simultaneously capture nonlinear environmental relationships and localized spatial variations. The enhanced mapping precision and improved representation of spatial details highlight the strong potential of GWRF for applications requiring high-accuracy soil information. GWRF is well-suited for SOM prediction under complex terrain conditions and can serve as an effective technical tool for county-level soil property estimation. Future research may incorporate human-activity-related variables, employ localized variable-selection strategies within the GWRF framework to further refine model performance, and explore the application potential of more advanced deep learning models in soil property mapping.

A Bi-LSTM Prediction Method for Apple First Flowering Date Based on Enhanced Time-Series Temperature Features |

LIU Enqi, LIU Miao, WANG Tuo, ZHU Yaohui, CHEN Riqiang, XU Bo, GAO Meiling, ZHANG Jing, YANG Yun, YANG Guijun

2026, 8(2): 86-97. doi:10.12133/j.smartag.SA202510026

Asbtract ( 834 )

HTML ( 8)

PDF (4977KB) ( 25 )

Figures and Tables | References | Related Articles | Metrics

[Objective] The first flowering date of apples is a key phenological stage in the annual growth cycle of fruit trees. Its occurrence timing is directly associated with pollination efficiency, fruit set rate, and subsequent fruit development, and it also serves as an important basis for orchard management practices, including flower and fruit thinning, pest and disease control, as well as early risk warning and emergency management for low-temperature frost events during the flowering period. Existing studies still have room for improvement in the fine-scale extraction of temperature time-series information and in the representation of model adaptability across different spatial locations. Therefore, the purpose of this research is to develop a prediction method for the first flowering date of apples that can effectively characterize time-varying temperature patterns and achieve regional adaptability, thereby providing more reliable technical support for refined orchard management and disaster prevention. [Methods] A deep learning-based forecasting framework for predicting the first flowering date of apples was developed based on observation sites in Luochuan county, Shaanxi province. First, daily near-surface air temperature (NSAT) data from 2019 to 2021 were collected for the period from apple harvest to the subsequent flowering season in the study area, including daily maximum, mean, and minimum temperatures. In addition, elevation, latitude, and longitude were introduced as static geographic factors, forming a combined input composed of dynamic temperature sequences and static spatial attributes. Second, in terms of the model design, a bidirectional long short-term memory network (Bi-LSTM) was employed as the temporal encoder to learn bidirectional dependencies within the temperature time series. On this basis, a customized multi-head attention (MHA) mechanism was integrated, consisting of a local dependency head, a global trend head, and a cumulative feature head, which were designed to represent short-term pre-flowering temperature fluctuations, overall temperature trends, and cumulative temperature effects, respectively. This configuration enhanced the extraction of time-varying information across multiple temporal scales. The attention outputs were then fused with the static geographic factors, and the predicted first flowering date was generated through a regression layer, enabling regionally adaptive prediction. To ensure comparability of results, LSTM and Bi-LSTM models were simultaneously constructed as baseline models using identical data preprocessing and training procedures.Third, Bayesian optimization was applied for automatic hyperparameter tuning, during which key parameters, including learning rate, number of network layers, number of hidden units, regularization terms, and optimizers, were systematically searched, and the optimal configuration was selected based on validation performance. Finally, a cross-year validation strategy was adopted to evaluate model generalization ability: Data from 2019 to 2021 were used as the modeling dataset (training and validation), while the observed first flowering date in 2022 served as an independent test dataset. The predictive performance of all models was evaluated using three widely recognized metrics: root mean square error (RMSE), mean absolute error (MAE), and correlation coefficient (R). [Results and Discussions] The proposed model achieved an RMSE of 1.34 d, a MAE of 1.13 d, and the R of 0.84 on the test dataset, with most prediction errors concentrated within a range of 0－2 d. Validation results indicated that the proposed approach was capable of providing stable predictions approximately 15－20 d in advance within the study area. Further comparative analysis demonstrated that the Bi-LSTM architecture more effectively exploited both forward and backward dependencies in the pre-flowering temperature time series, thereby offering a more stable temporal representation for regression-based prediction of the first flowering date. Building upon this structure, the introduction of three attention heads: the local dependency head, the global trend head, and the cumulative feature head, enabled the model to more explicitly distinguish and utilize short-term fluctuations, stage-wise trends, and cumulative temperature effects. This targeted extraction of multi-scale time-varying information contributed to reduced prediction errors and improved overall prediction accuracy. Ablation experiments involving static geographic factors further verified the necessity of the spatial adaptability component. When the elevation was removed, the RMSE increased from 1.34 d to 1.45 d. Removing latitude and longitude led to a larger increase in RMSE to 2.54 d, and when both elevation and geographic coordinates were excluded, the RMSE further rose to 2.69 d accompanied by a decrease in correlation. These results indicated that geographic factors provided effective spatial constraints, which supported the learning of location-specific phenological responses across different sampling sites. In addition, spatial prediction maps revealed that the first flowering date in the study area exhibited a gradient distribution with respect to elevation to a certain extent. This spatial pattern was consistent with the modeling rationale of incorporating geographic factors into a unified prediction framework. [Conclusions] This study proposes a deep learning-based prediction method for the first flowering date of apples that integrates multi-dimensional temperature features, a multi-head attention mechanism, and geographic factors. The proposed method achieves relatively high prediction accuracy in cross-year forecasting and enables spatially adaptive prediction of the first flowering date of apples. These findings provide a new data-driven technical pathway for refined prediction of apple flowering phenology and offer important technical support for orchard flowering management, frost damage prevention, and agricultural production decision-making.

CAGE-YOLO: A Dense Small Object Detection Model for Aquaculture Net Cages Based on Remote Sensing Images |

ZHANG Wenbo, JIANG Yijue, SONG Wei, HE Qi, ZHANG Wenbo

2026, 8(2): 98-117. doi:10.12133/j.smartag.SA202508023

Asbtract ( 1202 )

HTML ( 24)

PDF (3488KB) ( 33 )

Figures and Tables | References | Related Articles | Metrics

[Objective] Detecting dense and small aquaculture net cages in complex backgrounds is difficult, the purpose of this study is to build a specialized dataset and design a targeted detection model that enhances recognition accuracy and robustness for practical aquaculture management. [Methods] A dataset of aquaculture net cages was constructed using high-resolution remote sensing imagery collected from seven representative farming regions (Australia, Canada, Chile, Croatia, Greece, China, and the Faroe Islands), and Cage-YOLO, a deep learning model based on YOLOv5, was proposed for detecting dense and small aquaculture net cages. First, an adaptive dense perception algorithm was introduced, which automatically selects and generates feature maps that reflect the high-density distribution of small aquaculture net cages. Second, an enhanced module based on spatial pyramid pooling fast was integrated to effectively reduce background noise interference and improve global feature extraction capabilities. Finally, a mixed attention block was incorporated to further enhance the model's perception of dense and small objects. [Results and Discussions] Experimental results showed that the proposed Cage-YOLO achieved improvements over the original YOLOv5 in terms of precision, recall, and mean average precision by 5.6, 21.8, and 17.4 percentage points, respectively. The model size was maintained at 16.9 MB, demonstrating both strong performance and deployment advantages. [Conclusions] This study provides a new approach for dense and small object detection and offers technical support for the intelligent management of marine cage aquaculture.

Cross-Modal Attention for Multi-Source Remote Sensing Crop Classification under Cloud Occlusion and Complex Field Scenarios |

WU Chenxu, ZUO Haolong, LI Gang

2026, 8(2): 118-132. doi:10.12133/j.smartag.SA202510010

Asbtract ( 994 )

HTML ( 11)

PDF (8282KB) ( 34 )

Figures and Tables | References | Related Articles | Metrics

[Objective] Accurate and timely crop mapping is fundamental for agricultural management, yield forecasting, and food security assessment. However, in mountainous and hilly regions characterized by frequent cloud cover and highly fragmented farmland, crop classification methods relying solely on optical remote sensing data are severely constrained. Persistent cloud contamination introduces data gaps and temporal inconsistencies in optical image time series, significantly degrading classification accuracy and robustness. To address these limitations, a robust and adaptive deep learning framework is developed capable of effectively integrating multi-modal remote sensing data. The primary objective is to enhance crop classification accuracy and stability under complex conditions where optical observations are scarce or unreliable, thereby supporting reliable agricultural monitoring in cloudy and fragmented landscapes. [Methods] A novel deep neural network architecture named 3D convolutional neural network based on attention mechanism (Attention-3DCNN) was proposed, designed to jointly exploit multi-temporal optical and synthetic aperture radar (SAR) observations. The model integrated Sentinel-2 multispectral time-series imagery with weather-insensitive Sentinel-1 SAR data through a dedicated cross-modal fusion strategy driven by a triple-attention mechanism. The network adopted a dual-branch feature extraction architecture. For the Sentinel-2 data, a hybrid module combining three-dimensional and two-dimensional convolutional neural networks (3D-CNN and 2D-CNN) was employed to capture discriminative spatiotemporal features and crop phenological dynamics across the growing season. This design enabled effective modeling of the spectral-temporal interactions inherent in crop development. For the Sentinel-1 SAR data, depthwise separable convolutions were utilized to efficiently extract spatial and textural features related to crop structure and surface scattering characteristics while reducing computational complexity. Features extracted from both modalities were subsequently integrated using a custom-designed attention-based fusion module. This module consisted of three complementary attention mechanisms: channel attention, temporal attention, and spatial attention. Residual connections were incorporated throughout the network to facilitate stable training and effective gradient propagation. The proposed model was evaluated on two datasets to assess both its performance and generalizability. The first was the publicly available panoptic agricultural satellite time series (PASTIS) benchmark dataset from France, which contained dense time-series observations and multiple crop classes. The second was a real-world dataset constructed for Yishui county, Shandong province, China, which was characterized by high cloud frequency (approximately 33%), highly fragmented farmland (average parcel size < 0.5 hm²), and a relatively simple crop rotation system. Comparative experiments were conducted against several state-of-the-art models, including 3D-ConvSTAR, UNet++, Self-Attention 3D, CNN-LSTM dual-stream network, and TGF-Net. Ablation studies were also performed to quantify the contribution of each attention component. [Results and Discussions] Experimental results demonstrated that Attention-3DCNN consistently outperformed all baseline methods on both datasets. On the PASTIS benchmark, the model achieved an overall accuracy (OA) of 97.5%, confirming its strong classification capability under favorable observation conditions. On the more challenging Yishui county dataset, Attention-3DCNN attained an OA of 93%, outperforming the other comparison models. Ablation experiments confirmed the effectiveness of the proposed triple-attention mechanism, as removing any attention component resulted in a clear reduction in classification performance. Under heavy cloud coverage, Attention-3DCNN exhibited the smallest accuracy degradation, with an OA drop of only 3.6 percentage points, indicating its ability to adaptively rely on SAR information when optical data quality deteriorated. In regions with highly fragmented farmland, the proposed model also maintained the highest accuracy and the smallest performance decline (2.8 percentage points), benefiting from the spatial attention mechanism. Moreover, attention visualization provided meaningful interpretability. Temporal attention peaks aligned with key crop phenological stages, while channel attention highlighted spectrally and physically informative optical bands and SAR polarizations, which was consistent with established agronomic and remote sensing knowledge. [Conclusions] This study presents the Attention-3DCNN model for accurate and robust crop classification in regions affected by persistent cloud cover and fragmented agricultural landscapes. By fusing Sentinel-2 optical and Sentinel-1 SAR time-series data through a channel-temporal-spatial triple-attention mechanism, the proposed framework enables adaptive integration of complementary multi-modal information. The model achieves outstanding performance on both benchmark and real-world datasets, demonstrates strong robustness under adverse conditions, and offers enhanced interpretability. Overall, the proposed approach provides a reliable and practical solution for crop mapping in complex agricultural environments.

An Improved YOLOv10-Based Tomato Ripeness Detection Algorithm with LAMP Channel Pruning |

ZHAO Licheng, LU Xinyu, WU Qian, REN Ni, ZHOU Lingli, CHENG Yawen, HU Anqi, QI Chao

2026, 8(2): 133-146. doi:10.12133/j.smartag.SA202507045

Asbtract ( 556 )

HTML ( 22)

PDF (3753KB) ( 44 )

Figures and Tables | References | Related Articles | Metrics

[Objective] As a major crop in protected horticulture, cluster tomatoes grow in clusters with dense overlapping fruits. In greenhouse environments, light conditions are complex and variable, and the fruit color transitions continuously from green to red across different ripening stages, showing continuous gradation characteristics. These factors result in the low efficiency and strong subjectivity of traditional manual recognition methods. Meanwhile, deep learning-based detection models often suffer from decreased detection accuracy, large localization errors, and slow inference speed when facing complex backgrounds and color interference, making it difficult to meet the dual requirements of real-time performance and high precision in practical applications. Therefore, to meet the practical application requirements of high accuracy, high real-time performance, and strong robustness for cluster tomato ripeness detection, this paper proposes a lightweight target detection model for cluster tomato ripeness, namely LampCT-YOLO (Cluster Tomato YOLO with LAMP pruning), which is based on improved YOLOv10. Through structural optimization and lightweight transformation of the baseline model, the detection accuracy, inference speed, and robustness are effectively improved, providing a novel technical solution for cluster tomato ripeness detection. [Methods] Taking YOLOv10 as the baseline model, first, the issue of insufficient feature extraction capability in complex scenarios was addressed by introducing the SegNeXt attention mechanism into the backbone network. By adaptively adjusting attention weights and calculating the correlation matrix between different feature channels, the mechanism automatically identified color channels strongly associated with the three ripeness levels of cluster tomatoes and assigned them higher attention weights, while suppressing feature responses from irrelevant background channels such as greenhouse frames, soil, and irrigation pipes. To achieve lightweight deployment of the model and meet the real-time detection requirements of edge devices, a gradient-based global channel importance method—LAMP channel pruning technology—was introduced after model training. The core principle of this technology was to evaluate the contribution of each channel to the model's detection performance by calculating the gradient magnitude of channels in each network layer, thereby eliminating redundant channels. This significantly reduced the model size and computational complexity while effectively maintaining the model's high detection performance for the three-category ripeness classification of cluster tomatoes. [Results and Discussions] Experiments showed that under the environment of NVIDIA A100 graphics card, for 240 cluster tomato images in the test set, the LampCT-YOLO model exhibited excellent detection performance. The mean average precision at 50 intersection over union (mAP50) for the early ripe, mid-ripe, and late ripe stages of cluster tomatoes was 84.6%, 89.5%, and 88.4%, respectively, which represented increases of 5.5, 7.7, and 0.9 percentage points compared with YOLOv10. The average mAP50 for the three ripeness categories of cluster tomatoes reached 87.6%, a 4.7 percentage points improvement over YOLOv10, demonstrating outstanding performance in both detection accuracy and stability. In addition, the model was found to maintain high recognition accuracy when facing variations in light intensity, fruit occlusion ratio, and background complexity, indicating good robustness and environmental adaptability. Regarding the lightweight effect, after applying the LAMP channel pruning technology, the number of model parameters and computational complexity were reduced by 63.07% and 50.06%, respectively, while the inference speed was improved by 23.1%. This effectively met the requirements of edge computing devices for real-time detection and low power consumption, alleviating the trade-off between model accuracy and inference speed. To verify the practical application value of the LampCT-YOLO model, the model was deployed on a self-developed fruit and vegetable inspection robot, which conducted field tests on 456 clusters of tomatoes in a real greenhouse environment. The results showed that the inspection robot successfully identified 78, 61, and 248 clusters of early ripe, mid-ripe, and late ripe cluster tomatoes, respectively, with detection accuracies of 84.8%, 87.1%, and 84.4%, and an average accuracy of 85.4%. Meanwhile, there were 5, 7, and 10 false detections, as well as 9, 2, and 36 missed detections for the early ripe, mid-ripe, and late ripe stages respectively, which to a certain extent reflected the practical application potential of the model. [Conclusions] The optimized LampCT-YOLO model not only significantly improves the recognition accuracy of cluster tomatoes at different ripening stages but also greatly reduces the model complexity, successfully achieving efficient deployment in resource-constrained scenarios. This model effectively balances the dual requirements of detection accuracy and real-time performance for inspection robots, and further constructs a reusable technical framework for the ripeness detection of protected horticultural fruits and vegetables. It provides strong support for the transformation of protected agriculture from labor-intensive to technology-intensive, and injects key innovative impetus into the large-scale and diversified implementation of smart agriculture.

A Lightweight Method for Pear Surface Defect Detection Based on Improved Mamba-YOLO Architecture |

XIU Xianchao, FEI Shiqi, HUANG Wenqian, LI Nan, MIAO Zhonghua

2026, 8(2): 147-157. doi:10.12133/j.smartag.SA202508022

Asbtract ( 294 )

HTML ( 12)

PDF (1523KB) ( 31 )

Figures and Tables | References | Related Articles | Metrics

[Objective] Pears are a common fruit rich in vitamins and minerals. Traditional pear grading primarily relies on manual inspection, which is not only laborious but also susceptible to subjective factors, leading to unstable and inaccurate results. Furthermore, manual operations may cause varying degrees of physical damage to pears, affecting their appearance and market value. Therefore, developing an automated, efficient, and reliable pear grading technology has become an urgent demand in the industry. To address the current problem of poor detection accuracy caused by the small scale of surface defects in Dangshan pears, a lightweight high-precision model was proposed based on an improved Mamba-YOLO architecture, aiming to balance detection accuracy and efficiency. [Methods] The dataset comprised 1 000 images, which were partitioned into training, validation, and test sets in an 8:1:1 ratio. The following improvements were made to the network architecture. Firstly, a dynamic upsampling (Dysample) module was adopted. Compared to the existing upsampling module in Mamba-YOLO, the Dysample module featured fewer parameters and floating-point operations (FLOPs). Its design eliminated complex dynamic convolution kernels, requiring only a small number of linear layers and grouping operations, thereby preserving computational efficiency while enhancing the retention of defect details. Secondly, regarding pear surface defect detection, defects often exhibited high-frequency local features, whereas traditional convolutional neural networks (CNNs) suffer from insufficient feature capture and imbalanced frequency response. As the dilation rate increased, the frequency response of the convolution kernel decreased and its bandwidth narrowed, consequently limiting its ability to process high-frequency information. Therefore, a frequency-adaptive dilated convolution (FADC) module was proposed, which dynamically adjusted the convolution kernel size, enabling the network to adaptively select matching kernels based on local input features. Smaller kernels were used in high-frequency regions, and larger kernels in low-frequency regions, thereby achieving collaborative optimization of multi-band features and enhancing the ability to extract defect features. Finally, considering that using only single-scale depthwise convolutions to capture local features might lead to insufficient perception of input feature information, and that traditional gating mechanisms may lack adequate global context information modeling, the squeeze-and-excitation module was fused with a channel mixer based on the convolutional gated linear unit (CGLU). This combination was extended into a multi-scale version termed MS-CGLU. By incorporating convolutional kernels of different sizes to extract multi-scale features, followed by weighted fusion, stronger feature representation was achieved. [Results and Discussions] The proposed method was rigorously evaluated on the dangshan pear test set. Ablation experiments demonstrated that introducing the CGLU, FADC, and Dysample enhanced detection performance, confirming the effectiveness of these modules. Compared to YOLOv8n, Gold-YOLO-N, and YOLOv12n, the mean average precision (mAP) was higher by 4.7, 5.3, and 6.3 percentage points, respectively. Compared to the baseline Mamba-YOLO-T, the mAP increased by 3.4 percentage points and the frames per second improved by 10.8 percentage points. Furthermore, in comparative experiments with larger-scale models from the same Mamba-YOLO series, the proposed algorithm still demonstrated significant advantages, i.e., its parameter count was only 41.7% of Mamba-YOLO-B and 15.7% of Mamba-YOLO-L, and its FLOPs was merely 57.1% and 18.1% of the respective models, yet it achieved increases in mAP@0.5 of 3.2% and 1.4%, and increases in mAP@0.5:0.95 of 3.1% and 2.6%, respectively. [Conclusions] This research developed a high-precision and lightweight algorithm for detecting surface defects on Dangshan pears. It achieved a superior balance between detection accuracy and inference speed, significantly outperforming relevant lightweight benchmarks and even larger models within its own family in terms of efficiency. This work can provide reliable algorithmic support for lightweight detection research of pear surface defects.

CD-YOLO: A Method for Detecting Carrot Seedlings in Field Based on Improved YOLOv11s |

LIU Haoran, WANG Yu, ZHAO Xueguan, WU Huarui, FU Hao, PANG Shujie, ZHAI Changyuan

2026, 8(2): 158-174. doi:10.12133/j.smartag.SA202511008

Asbtract ( 329 )

HTML ( 25)

PDF (9732KB) ( 26 )

Figures and Tables | References | Related Articles | Metrics

[Objective] In field environments under natural conditions, leaf occlusion and mutual plant shading pose significant challenges to the accurate identification of carrot seedlings. Furthermore, practical agricultural applications often rely on edge devices with limited computational power, necessitating a detection model that combines lightweight design, high accuracy, and robust anti-occlusion capability. The purpose of this research is to develop a robust recognition method for carrot seedlings suitable for complex field conditions, thereby enhancing the accuracy and efficiency of seedling emergence statistics in automated seedling raising processes and providing reliable technical support for precise farm management. [Methods] The CD-YOLO (Carrot Detection-You Only Look Once), a lightweight detection model was proposed based on an improved YOLOv11s. First, to reduce model complexity, several standard convolutions in the backbone network were replaced with depthwise separable convolutions (DWConv), thereby decreasing floating-point operations (FLOPs) and the number of parameters, establishing a lightweight foundation for edge deployment. Secondly, the efficient multi scale attention (EMA) mechanism was embedded into the critical feature extraction module C3k2, constructing a C3k2_EMA module. This module enhanced dynamic perception of local key features and reconstructed cross-scale contextual dependencies broken by occlusion through its parallel multi-branch structure, effectively suppressing background and occlusion noise. Finally, the DynamicHead detection head was introduced. Leveraging its scale-aware and spatial-aware mechanisms, it achieved a dynamic fusion of multi-level features and adaptive weight adjustment, further improving the model's decision-making robustness in complex scenes. To comprehensively evaluate model performance, a carrot seedling dataset covering various field scenarios was independently constructed. Through offline data augmentation, the original 1 274 images were expanded to 4 796, which were then split into training, validation, and test sets in an 8:1:1 ratio. Meanwhile, to systematically quantify the model's anti-occlusion performance, an occlusion severity assessment criterion based on the overlapping area of bounding boxes was proposed. Targets were categorized into three occlusion levels: mild, moderate, and severe. Based on this, a dedicated "Occlusion Test Subset" was separated from the main test set, providing an objective and reproducible benchmark for evaluating the model's anti-occlusion capability. [Results and Discussions] Experimental results on the custom dataset demonstrated that CD-YOLO comprehensively improved detection performance while maintaining its lightweight characteristics. Compared to the baseline model YOLOv11s, CD-YOLO reduced computational load by 6.2 GFLOPs (a 28.8% decrease), decreased model size by 4.8 MB (a 25.0% reduction), improved single-image inference speed by 4.7 ms, reaching 9.6 ms. Concurrently, precision, recall, and mean average precision (mAP_0.5) increased by 3.0, 1.5, and 2.4 percentage points, respectively, ultimately reaching 81.2%, 76.4%, and 84.0%. In comparisons with other lightweight backbone networks like MobileNetv3 and ShuffleNetv2, CD-YOLO consistently outperformed them on the accuracy-speed comprehensive metric, validating the effectiveness of its improvement strategies. In occlusion performance tests, the missed detection rate of CD-YOLO on the occlusion test subset was 13.4%, a 5.7 percentage points decrease compared to YOLOv11s. Its mAP_0.5 on the occlusion subset reached 80.6%, a 5.1 percentage points improvement over the baseline, whereas the improvement on the regular subset was 1.8 percentage points, proving the model's enhanced efficacy in occlusion scenarios. After deploying the model on an NVIDIA Jetson Orin NX edge device and accelerating it with TensorRT, the inference frame rate increased to 32.5 f/s. On random test images, CD-YOLO achieved missed detection and false detection rates of 5.1% and 2.7%, respectively, representing decreases of 7.7% and 2.6% compared to YOLOv11s, demonstrating promising practical application potential. Ablation studies and feature map visualizations further indicated that DWConv, C3k2_EMA, and DynamicHead formed a synergistic optimization loop: DWConv achieved computational compression, freeing up computational budget for subsequent modules; C3k2_EMA enhanced local perception and contextual reconstruction of occluded targets during the feature extraction stage; and DynamicHead performed dynamic fusion of multi-scale features at the decision-making end. Together, they ensured high-precision detection of incomplete targets under limited computational resources. [Conclusions] Through the synergistic design of "lightweighting, feature enhancement, and dynamic fusion", the CD-YOLO model achieved an excellent balance between computational efficiency, detection accuracy, and anti-occlusion capability. The model not only significantly reduced reliance on the computational power of edge devices but also effectively improved robustness and adaptability in complex field environments through structured attention and dynamic fusion mechanisms.

Field Maize Yield Prediction Model Based on Causal Inference and Machine Learningin Agricultural Fields |

WANG Yi, CUI Xitong, WANG Chen, XIONG Baowei, SHAO Guomin, WANG Wanying, CAO Pei, HAN Wenting

2026, 8(2): 175-187. doi:10.12133/j.smartag.SA202506027

Asbtract ( 236 )

HTML ( 9)

PDF (1724KB) ( 26 )

Figures and Tables | References | Related Articles | Metrics

[Objective] Maize is one of the most important staple crops in the world and serves as a cornerstone of food security and agricultural sustainability. Accurate and timely prediction of maize yield is essential for optimizing agricultural management practices, supporting market regulation, and guiding policy decisions related to food supply and climate adaptation. In recent years, data-driven yield prediction methods based on machine learning and deep learning have achieved notable improvements in predictive accuracy. However, most existing approaches primarily rely on statistical correlations among variables and often treat influencing factors as independent predictors, without explicitly addressing the complex causal mechanisms and time-lagged interactions that govern crop growth processes. This limitation may lead to reduced model interpretability and compromised robustness under changing environmental conditions. To address these challenges, a novel maize yield prediction framework that integrates causal inference with a hybrid deep learning model was proposed, aiming to improve both predictive performance and mechanistic understanding. [Methods] Multi-source heterogeneous datasets collected across the maize growing season were utilized, including remote sensing-derived vegetation indices, meteorological variables (such as temperature and precipitation), soil profile moisture measurements at multiple depths, and crop observation data corresponding to key phenological stages. First, the Peter-Clark and momentary conditional independence (PCMCI) causal discovery algorithm was applied to systematically identify causal relationships between maize yield and its potential driving factors. The PCMCI method enables the detection of both contemporaneous and time-lagged causal links while effectively controlling for confounding effects in high-dimensional time series data. Through this process, the causal structure of yield formation was explicitly characterized, and key variables with statistically significant causal impacts were selected as inputs for the prediction model. Subsequently, a hybrid moving average, convolutional neural network-long short-term memory (MA-CNN-LSTM) model was constructed to capture the complex spatiotemporal patterns in the causally screened input variables. Specifically, a moving average module was employed as a preprocessing step to suppress high-frequency noise and enhance signal stability. A CNN was then used to extract latent correlation features among multiple variables, reflecting their joint influence on yield formation. Finally, an LSTM network was adopted to model temporal dependencies and cumulative effects across the growing season, enabling effective representation of dynamic yield responses. [Results and Discussions] The causal analysis revealed that soil moisture at depths of 10 cm and 50 cm exerted a significant positive influence on maize yield (P < 0.01), with deeper soil moisture showing a stronger and more persistent time-lagged effect. This finding highlighted the critical role of subsurface water availability in sustaining crop growth during later developmental stages. In addition, vegetation indiced such as the modified chlorophyll absorption ratio index and the normalized difference vegetation index exhibited significant short-term causal relationships with yield during the mid-growth stage of maize, indicating their sensitivity to canopy structure and photosynthetic activity during this period. Comparative experiments conducted against traditional statistical models and conventional machine learning approaches demonstrated that the proposed PCMCI-MA-CNN-LSTM framework consistently achieved superior predictive performance. On the test dataset, the coefficient of determination (R²) reached 0.955, while the mean absolute error (MAE) and root mean square error (RMSE) were reduced to 1.201 kg/mu and 1.474 kg/mu (1 hm²=15 mu). These results indicated that incorporating causal variable selection effectively enhances model accuracy and stability by reducing redundant and spurious correlations. [Conclusions] The results confirm that incorporating causal analysis into yield modeling provides a robust basis for identifying key driving variables and effectively enhances the accuracy and interpretability of maize yield prediction. The proposed framework offers a promising approach for precision agriculture and decision support in crop yield forecasting, particularly under complex and dynamic agro-environmental conditions.

CGG-Based Segmentation and Counting of Densely Distributed Rice Seeds in Seedling Trays |

OUYANG Meng, ZOU Rong, CHEN Jin, LI Yaoming, CHEN Yuhang, YAN Hao

2026, 8(2): 188-199. doi:10.12133/j.smartag.SA202507030

Asbtract ( 322 )

HTML ( 7)

PDF (4184KB) ( 12 )

Figures and Tables | References | Related Articles | Metrics

[Objective] The precise quantification of rice seeds within individual cavities of seedling trays constitutes a critical operational parameter for optimizing seeding efficiency and fine-tuning the performance of air-vibration precision seeders. Achieving high accuracy in this task directly impacts resource utilization, seedling uniformity, and ultimately crop yield. However, the operational environment presents significant challenges, including complex backgrounds, seed overlap, variations in lighting and seed orientation, and the inherent difficulty of distinguishing individual seeds within dense clusters. These factors often lead to suboptimal performance in existing automated detection systems, manifesting as low detection accuracy and an inability to achieve robust, precise instance segmentation of individual rice seeds. To address these persistent limitations and advance the state-of-the-art in precision seeding monitoring, an integrated framework for rice seed instance segmentation was proposed. The core innovation lies in the synergistic combination of a cross-modal grounding generation (CGG) network with a pretrained model, which is designed to leverage complementary information from visual and textual domains. [Methods] The proposed methodology fundamentally aimed to bridge the gap between visual perception and semantic understanding within the specific context of rice seed detection. The CGG-pretrained model framework achieved this through deep joint alignment of visual features extracted from seedling tray images and textual features derived from contextual knowledge. This cross-modal grounding enabled collaborative learning, where the visual processing stream (handling object localization and pixel-level segmentation) was continuously informed and refined by the semantic understanding stream (interpreting context and relationships). Specifically, the visual backbone network processes input imagery to generate feature maps, while the pretrained language model component, which utilized contextual embeddings, generated semantically rich textual representations. The CGG module acted as the fusion engine, establishing explicit correspondences between specific regions in the image (potential seeds or clusters) and relevant semantic concepts or descriptors provided by the pretrained model. This bidirectional interaction significantly enhanced the model's ability to disambiguate overlapping seeds, resolved occlusions, and accurately delineated individual seed boundaries under challenging conditions. Key technical innovations validated through rigorous ablation studies include: (1) The strategic use of the bootstrapping language-image pre-training (BLIP) model for generating high-quality pseudo-labels from unlabeled or weakly labeled image data, facilitating more effective semi-supervised learning and reducing annotation burden, and (2) the application of bidirectional encoder representations from transformers (BERT)-based word embed to capture deep semantic relationships and contextual nuances within textual descriptors related to seeds and seeding environments. [Results and Discussions] The ablation experiments demonstrated a pronounced synergistic effect when the core improvements were combined, resulting in a segmentation accuracy improvement exceeding 3 percentage points compared to the baseline model that lacking the integration. Comprehensive experimental evaluation demonstrated the superior performance of the proposed CGG model against established benchmarks. Under the standard intersection over union (IoU) threshold of 0.5, the model achieved a mean average precision (mAP) of 90.7% for bounding box detection (denoted as mAP50^bb for detection) and an outstanding 91.4% mAP for instance segmentation (denoted as mAP50^seg for segmentation). These results represented a statistically significant improvement over leading contemporary models, including region-based convolutional neural network (Mask R-CNN) and Mask2Former, which highlighted the efficacy of the cross-modal grounding approach in accurately localizing and segmenting individual rice seeds. Further validation within realistic seeding trial scenarios, which involved direct comparison with meticulous manual annotations, confirmed the model's practical robustness. The CGG model attained the highest accuracy in two critical operational metrics: (1) Precision in segmenting individual seed instances (single-seed segmentation accuracy), and (2) accuracy in determining the exact seed count per cavity, and it achieved an average accuracy of 88% for per-cavity quantification. Moreover, the model exhibited superior performance in minimizing estimation errors for cavity seed counts, as evidenced by its significantly lower error metrics: a root mean square error (RMSE) of 16.8 seeds, a mean absolute error (MAE) of 13.7 seeds, and a mean absolute percentage error (MAPE) of 2.46%. These error values were markedly lower than those recorded by the comparison models, which underscored the CGG model's enhanced reliability in practical counting tasks. The discussion contextualized these results and attributed the performance gains to the model's ability to leverage semantic context to resolve ambiguities inherent in visual-only approaches, particularly in dense and overlapping seed scenarios common in precision seeding trays. [Conclusions] The developed CGG-pretrained model integration presents a significant advancement in automated monitoring for precision rice seeding. The model successfully addresses the core challenges of low detection accuracy and imprecise instance segmentation for seeds in complex environments. Its high accuracy in both individual seed segmentation and per-cavity seed count quantification, coupled with low error rates, demonstrates strong potential for practical deployment. Importantly, the model enables real-time detection of rice seeds during the image analysis stage, this functionality provides a quantifiable, data-driven basis for making immediate operational decisions, most notably enabling the targeted precision reseeding of empty or under-seeded cavities identified during the seeding process. By ensuring optimal seed placement and density from the outset, the technology contributes directly to improved resource efficiency (reducing seed waste), enhanced seedling uniformity, and potentially higher crop yields. Future work will focus on further optimizing inference speed for higher-throughput seeding lines and exploring generalization to other crop types and seeding mechanisms.

DEMA-3D TSP: An Enhanced Reinforcement Learning with DEMA Attention in Sequence Optimization for Safflower Picking Robot |

LI Menghao, WANG Xiaorong, LIU Zihe, DUAN Mengyu, JIN Zhengyang

2026, 8(2): 200-219. doi:10.12133/j.smartag.SA202506004

Asbtract ( 247 )

HTML ( 12)

PDF (2320KB) ( 15 )

Figures and Tables | References | Related Articles | Metrics

[Objective] There are several critical challenges in automated safflower harvesting, particularly the inefficiencies in path planning, suboptimal route quality, and limited decision-making capability under dynamic and complex environments. To solve these issues, the problem was formulated as a three-dimensional traveling salesman problem and an enhanced reinforcement learning model named actor-critic reinforcement learning pointer network (AC-RL-PtrNet) was proposed, specifically designed for deployment on intelligent safflower picking robots in agricultural settings. [Methods] First, to address the inherent limitations of conventional attention mechanisms in dynamic environments with complex spatial structures, an enhanced attention module was proposed based on the dynamic exponential moving average framework. By combining multi-head attention, spatial distance encoding, and adaptive exponential smoothing, the improved design allowed the model to better capture long-range dependencies and spatial context among safflowers. Meanwhile, to minimize computational cost while preserving inference quality, a structured pruning approach was adopted, which selectively removed redundant connections in the long short-term memory gates and fully connected layers. In parallel, the critic network was redesigned to improve learning stability and accuracy. This was achieved through the inclusion of batch normalization, residual feature aggregation, and a multi-layer value estimation head, all of which contributed to a tighter actor-critic synergy during policy training. [Results and Discussions] To quantitatively assess the impact of each component, ablation experiments were conducted across various configurations. The results confirmed that each module contributed distinct benefits, while their combination yielded the highest improvements in both planning precision and inference efficiency. This coordinated actor-critic design effectively enhanced both trajectory quality and decision stability, which were critical in sequential robotic picking tasks. Experimental results also demonstrated that, compared with traditional swarm intelligence algorithms particle swarm optimization (PSO), ant colony optimization (ACO), and non-dominated sorting genetic algorithm, the proposed AC-RL-PtrNet model achieved a planning time improvement ranging from -2.63% to 61.87% on the 25-target dataset and from 22.93% to 59.1% on the 31-target dataset. Meanwhile, the optimized paths were significantly shortened across different planning instances, indicating robust generalization capability under varied problem scales. Furthermore, field experiments provided concrete validation of the model's practical applicability. When deployed on a mobile picking robot in real safflower fields, the AC-RL-PtrNet achieved a 9.56% reduction in path length and 5.43% time saved for a 25-target picking task, and a 20.17% path reduction and 29.70% time saving for a 31-target scenario involving a different safflower variety. Overall, these results all indicated that the proposed method exhibited significant advantages in enhancing path planning efficiency and optimizing path quality. [Conclusions] This study offers a practical solution for achieving efficient and robust automatic picking by safflower picking robots and provides new insights into solving 3D combinatorial optimization problems.

AgriAgent: End-to-End Large Model Agent System Architecture for Agricultural Environment Control |

QIU Jiaying, LIU Yingchang, GAO Xingjie, HUANG Yuan, ZHANG Hongyu, TIAN Fang, LI Wanli, FENG Zaiwen

2026, 8(2): 220-236. doi:10.12133/j.smartag.SA202507042

Asbtract ( 347 )

HTML ( 27)

PDF (2326KB) ( 46 )

Figures and Tables | References | Related Articles | Metrics

[Objective] Large language models (LLMs) have demonstrated strong capabilities in natural language understanding, knowledge integration, and complex reasoning, offering new opportunities for intelligent decision-making in agriculture. However, their direct application in agricultural production and facility environment control remains challenging due to strong physical constraints and high operational risks. The lack of real-world interaction and executable decision grounding limits the practical effectiveness of conventional LLMs in such scenarios. To address these challenges, a tool-augmented LLM-based agricultural intelligent agent system, termed AgriAgent, was proposed, and a digital-twin-based evaluation platform for agricultural decision-making was developed. By integrating a high-fidelity digital twin environment with an end-to-end agent architecture, the decision-making performance of agricultural intelligent agents with different parameter scales was systematically evaluated across multiple crops and climate scenarios. [Methods] A high-fidelity agricultural digital twin evaluation platform was constructed using the decision support system for agrotechnology transfer (DSSAT) v4.8 crop growth model as the core simulation engine to model crop growth under diverse environmental conditions and management strategies. Meteorological driving data were obtained from the Seoul Historical Weather Data dataset. Through data cleaning, missing-value imputation, unit normalization, and time-series reconstruction, the raw meteorological data were transformed into standardized inputs compatible with DSSAT. Three climate scenarios representing different environmental complexities were designed, including a regular scenario, a perturbed scenario, and an extreme scenario. The regular scenario employed historical observations, the perturbed scenario introduced stochastic disturbances to simulate short-term climate variability, and the extreme scenario incorporated multi-factor coupled stresses such as high temperatures and excessive precipitation during sensitive growth stages. In total, 90 annual climate driving sequences were generated. Fixed soil profile parameters calibrated by domain experts were applied across all simulations to minimize confounding effects. Within this digital twin environment, a tool-augmented agricultural intelligent agent, AgriAgent, was implemented using a modular architecture consisting of a sensor module, memory module, retriever, large language model, and tool executor, forming a closed-loop decision-making framework. In each decision cycle, the agent perceived environmental and crop state information, including soil moisture and nutrient status, meteorological conditions, crop growth stages, and stress indicators. State summaries and historical decisions were stored in memory, while agronomic knowledge was retrieved through a retrieval-augmented generation mechanism. Based on integrated information, the LLM generated structured environmental control commands in JSON format, which were validated and constrained by the tool executor before updating the DSSAT environment. The system supported irrigation, supplementary lighting, ventilation, heating, fertilization, and CO₂ enrichment. Five representative crops: maize, millet, sugar beet, tomato, and cabbage were simulated under the three climate scenarios over complete growing seasons, resulting in 450 crop-scenario combinations. An unmanaged DSSAT simulation served as the baseline. AgriAgent models with three parameter scales (1.5B, 3B, and 7B), built on the Qwen2.5 series, were evaluated. Crop economic yield expressed as dry matter at physiological maturity was adopted as the evaluation metric. [Results and Discussions] The results showed that AgriAgent consistently outperformed the baseline across all crops and climate scenarios, with model scale exerting a significant influence on decision-making performance. AgriAgent-7B achieved the best overall performance under regular, perturbed, and extreme scenarios, demonstrating strong generalization ability and environmental adaptability. By dynamically adjusting water, nutrient, light, and thermal management strategies, the agent effectively mitigated environmental stresses even under multi-factor coupled extreme climate conditions. Under extreme scenarios, AgriAgent-7B increased yields by 463.60% for maize, 351.20% for millet, 125.40% for sugar beet, 1 537.46% for tomato, and 1 185.14% for cabbage compared with the baseline. Particularly large gains were observed for high-value crops such as tomato and cabbage, highlighting the advantages of the proposed framework for precision-controlled facility agriculture. In contrast, AgriAgent-1.5B exhibited performance comparable to the baseline, while AgriAgent-3B achieved moderate improvements but remained inferior to the 7B model. These findings indicate a clear scaling effect, suggesting that larger models possess stronger capabilities in multi-source information integration, long-term temporal reasoning, and adaptation to complex environments. [Conclusions] This study developed a digital-twin-based agricultural decision evaluation platform and proposed a tool-augmented, end-to-end agricultural intelligent agent named AgriAgent. Experiments across multiple crops and climate scenarios verified the effectiveness and robustness of the proposed framework for dynamic agricultural decision-making. The results demonstrate that integrating knowledge retrieval, reasoning, and tool execution within a closed-loop LLM-based agent enables stable, reliable, and adaptive environmental control, providing a feasible technical pathway and standardized evaluation paradigm for intelligent agriculture.

Vegetable IoT Blockchain Anti Counterfeiting Traceability System Based on PQ-ECIES |

QI Peiyang, SUN Chuanheng, TAN Changwei, WANG Jun, LUO Na, XING Bin

2026, 8(2): 237-250. doi:10.12133/j.smartag.SA202507019

Asbtract ( 501 )

HTML ( 13)

PDF (2154KB) ( 19 )

Figures and Tables | References | Related Articles | Metrics

[Objective] The vegetable supply chain is characterized by multiple production entities, diverse product varieties, and complex circulation processes, which often result in low data accuracy, label forgery, data tampering, and difficulties in cross-enterprise collaboration in traditional traceability systems. Furthermore, the rapid development of quantum computing poses significant threats to existing cryptographic foundations by enabling efficient factorization or discrete logarithm attacks. This study aimed to design and implement a vegetable supply chain anti-counterfeiting and traceability system that integrates the Internet of Things (IoT), blockchain technology, and a post-quantum enhanced elliptic curve integrated encryption scheme (PQ-ECIES). The system seeks to enhance the trustworthiness, privacy protection, and collaborative efficiency of supply chain data management, while maintaining practical performance for IoT devices and high-frequency data uploading scenarios. [Methods] The proposed system was constructed on an IoT framework incorporating nine categories of devices. A registration and admission mechanism was developed to establish a trusted mapping between "device–enterprise–data", effectively preventing unauthorized entities from uploading forged data. At the data layer, collected information was divided into public and private categories: Public data were uploaded directly to the blockchain, while private data were encrypted using PQ-ECIES before being stored on-chain. Smart contracts automated processes such as data classification, permission verification, and encrypted data querying, thus reducing human intervention and ensuring compliance. PQ-ECIES was designed by combining elliptic curve cryptography (ECC) and the Kyber algorithm from lattice-based post-quantum cryptography. A dual-key mechanism was employed to generate session keys, where an ECC-derived shared secret was combined with a Kyber-derived shared secret through SHA3-256 hashing, followed by key derivation for encryption and authentication. This design provided resilience against Shor's algorithm and other quantum attacks while maintaining efficiency compatible with IoT devices. The blockchain system was implemented using Hyperledger Fabric 1.4.4, with seven organizational nodes and the Raft consensus mechanism. Performance testing included evaluations of data collection accuracy, on-chain latency, query latency, and encryption performance across RSA, advanced encryption standard (AES), and PQ-ECIES. [Results and Discussions] The IoT-based data collection achieved significantly higher accuracy than manual input, particularly in large-scale sample scenarios such as pesticide residue testing. The average latency for data uploading to the blockchain was 2 879 ms, while data query latency averaged 122 ms, both of which met the practical requirements of vegetable supply chain applications. In cryptographic performance testing, PQ-ECIES achieved encryption and decryption of 128 B plaintext in approximately 10－30 ms, outperforming RSA (50－80 ms) and only slightly slower than AES (<10 ms). This result indicates that PQ-ECIES achieved an optimal trade-off between efficiency and security, offering asymmetric encryption benefits such as key distribution and identity verification, along with strong post-quantum resistance. Simulation under quantum attack models confirmed that traditional ECC and AES could be compromised within hours using Shor's and Grover's algorithms, whereas PQ-ECIES maintained resilience due to the lattice-based hardness assumptions of Kyber. From a system-level perspective, three major contributions were identified. First, trustworthiness was enhanced by binding IoT devices to enterprises through Bluetooth-based verification and blockchain's immutable ledger, ensuring data authenticity at the source. Second, privacy protection was achieved by adopting graded visibility: Consumers accessed only public data such as testing results and logistics status, while regulators could decrypt private information (e.g., production location and batch details) via authorized keys, balancing transparency with confidentiality. Third, collaboration across enterprises was improved through the consortium blockchain structure and Fabric channel mechanisms, which eliminated information silos and enabled selective data sharing in real time, reducing inter-organizational access time from weeks to minutes. Experimental validation confirmed that IoT-based collection significantly improved accuracy, blockchain integration achieved acceptable on-chain and query latency, and PQ-ECIES outperformed RSA while offering post-quantum resistance not available in AES. [Conclusions] This study proposed and implemented a vegetable supply chain traceability system that integrates IoT, blockchain, and PQ-ECIES. By deploying nine categories of IoT devices, establishing trusted device-enterprise mappings, and incorporating blockchain's decentralized and tamper-proof ledger, the system ensured reliable data collection and storage. The integration of PQ-ECIES provided dual cryptographic protection, balancing efficiency with long-term quantum security. Beyond technical performance, the system enhanced trust, privacy, and collaboration across the vegetable supply chain, effectively addressing common issues of data forgery, tampering, and cross-enterprise coordination.Overall, the proposed framework demonstrates high potential for real-world deployment in agricultural supply chains, offering a secure, efficient, and future-proof solution to ensure authenticity, reliability, and transparency in vegetable traceability. The study also provides a reference model for extending post-quantum blockchain-based traceability to other agri-food sectors facing similar challenges.

Path Planning Algorithm for an Eel Feeding Robotic Arm Based on Improved BI-RRT |

MA Mengxian, XU Zhen, YUAN Quan, ZHOU Wenzong, ZHANG Chunyan

2026, 8(2): 251-264. doi:10.12133/j.smartag.SA202509020

Asbtract ( 269 )

HTML ( 11)

PDF (4047KB) ( 13 )

Figures and Tables | References | Related Articles | Metrics

[Objective] In the eel (Monopterus albus) farming system used in feed distribution research of mechanical arm, the challenges included slow path planning speeds, excessive trajectory redundancy, and suboptimal obstacle avoidance success rates within confined operational spaces. To mitigate these issues, an improved path planning algorithm, based on the bidirectional rapidly-exploring random tree star (BI-RRT*) algorithm was proposed. The primary aim was to significantly enhance the motion efficiency and task success rate of robotic arms operating in complex, constrained environments. [Methods] The proposed improved BI-RRT* algorithm integrated an adaptive goal-biased strategy with an enhanced artificial potential field (APF) method. The algorithm's framework comprises three core components: a high-quality sampling strategy, an efficient search strategy, and a path optimization algorithm. For the high-quality sampling strategy, an adaptive goal-biased approach was introduced to overcome the limitations of inefficient random sampling and slow convergence rates characteristic of traditional BI-RRT algorithms in complex environments. This strategy dynamically adjusted the generation of sampling points, moving beyond purely random selection. Instead, it prioritized sampling regions in the vicinity of the target, guided by the target direction and a predefined bias probability. This mechanism substantially augmented the growth propensity of the search tree towards the target area, effectively reducing the stochasticity of random sampling and consequently accelerating the path search process. To enhance search efficiency and prevent the algorithm from converging to local optima, an improved APF was incorporated into the node expansion process. The APF was refined to achieve superior integration with the BI-RRT framework. During each new node expansion, in addition to considering the inherent random exploration characteristics of BI-RRT, a directional attractive field was superimposed. This attractive field not only originated from the ultimate target point but also factored in the current growth orientation of the search tree and localized environmental information. Specifically, a composite attractive function was devised, which synergized the attractive force exerted by the target point on the current node with the attraction from potential "guide points". Concurrently, the computation of the repulsive field was optimized to more precisely delineate the geometry and proximity of obstacles, thereby circumventing common issues such as "oscillation" and "deadlock" prevalent in traditional APF. Through this methodology, the algorithm was able to more effectively steer the search tree to circumvent obstacles and rapidly converge towards the target region, significantly bolstering the directness of the search and successfully preventing the algorithm from becoming ensnared in suboptimal local solutions. For the path optimization algorithm, following the generation of an initial feasible path, a greedy optimization strategy was employed for path pruning and smoothing. This was executed to yield an optimal path characterized by reduced length, enhanced smoothness, and improved conformity with the kinematic properties of the robotic arm. Path pruning was initially applied to eliminate redundant nodes; if a collision-free direct connection existed between two non-adjacent nodes, intermediate nodes were excised, thereby substantially abbreviating the path length. Subsequently, path smoothing techniques, such as B-spline curves or cubic spline interpolation, were introduced to enable the robotic arm to execute movements with greater stability and efficiency during actual operation, mitigating impact and vibration. This two-stage optimization procedure ensured that the final generated path was not merely feasible but also optimal across metrics of length, smoothness, and motion efficiency. [Results and Discussions] To comprehensively validate the performance of the proposed algorithm, a two-stage experimental verification was conducted. Initially, comparative simulations were performed in both two-dimensional (2D) and three-dimensional (3D) environments utilizing the Matlab platform. These simulation scenarios were meticulously engineered to encompass three archetypal environments—simple, complex, and narrow passages—thereby emulating the diverse obstacle configurations potentially encountered in industrialized eel aquaculture. The results demonstrated that, concerning both path planning speed and quality, the improved BI-RRT* algorithm significantly surpassed RRT, APF-RRT*, and traditional BI-RRT* algorithms across all tested environments, substantiatingthe theoretical superiority and inherent robustness of the improved BI-RRT* algorithm proposed in this study across varying complex environments. To further ascertain the engineering applicability and practical potential of the algorithm, an eel feeding robotic arm simulation system was meticulously constructed based on the robot operating system and MoveIt frameworks. This system precisely emulated the kinematics, dynamics, and obstacle distribution pertinent to an industrialized eel aquaculture environment. During simulated continuous feeding tasks, the improved BI-RRT* algorithm consistently exhibited impressive and outstanding performance. Its average running time was merely 2.1 s, representing a substantial 41.6% reduction compared to the traditional BI-RRT*. The average length of the planned path was recorded at only 1 680 mm, with an average of 180 nodes, indicating a significant reduction in path redundancy. Furthermore, the algorithm achieved an impressive obstacle avoidance success rate of 96% in complex confined spaces. These empirical findings not only validated the algorithm's effectiveness but also underscored its immense potential for practical engineering applications. [Conclusions] The experimental results conclusively demonstrated that the improved BI-RRT* algorithm significantly enhanced the path planning efficiency and trajectory quality of robotic arms operating within confined spaces. It also exhibited exceptionally high reliability in obstacle avoidance, thereby effectively addressing the automated feeding requirements of industrialized eel aquaculture. The algorithmic framework possessed considerable generality, offering valuable theoretical insights and technical precedents for resolving analogous robotic arm path planning challenges in other agricultural automation contexts.

Key Factor Extraction Method of Agricultural User Demand Based on Large Language Models |

LI Runteng, WANG Yiqun, LI Hongda, LI Jingchen, CHEN Wenbai

2026, 8(2): 265-278. doi:10.12133/j.smartag.SA202509011

Asbtract ( 273 )

HTML ( 10)

PDF (3931KB) ( 32 )

Figures and Tables | References | Related Articles | Metrics

[Objective] In the agricultural domain, user demand texts serve as essential primary sources for agricultural extension, production management, and policy services. However, these texts typically contain highly specialized terminology, exhibit non-standard, colloquial, and diverse linguistic expressions, present fragmented semantics, and rely heavily on contextual reasoning. Such characteristics make them difficult to parse accurately using traditional rule-based approaches or shallow machine learning models. Consequently, these limitations often lead to biased demand classification and incomplete extraction of key factors, thereby constraining the quality of data available for intelligent agricultural decision-making. To address these challenges, the aim of this research is to develop a robust, domain-adapted, and highly interpretable structured analysis method for agricultural user demands. [Methods] Agri-NeedAgent, an agricultural user demand analysis framework, was proposed based on a "three-stage training + multi-agent collaboration" paradigm. First, during the domain knowledge pretraining stage, 80 000 agriculture-related texts, including crop cultivation manuals, pest and disease control guides, agricultural policy documents, and farmer consultation records, were used to construct domain-specific semantic understanding, thereby enhancing the model's capability to interpret agricultural terminology, dialectal expressions, contextual logic, and implicit semantics. Second, in the instruction fine-tuning stage, 6 320 annotated samples in an "instruction-input-output" format were employed to establish an explicit mapping from raw demand texts to structured outputs. Third, in the agricultural knowledge low-rank adaptation stage, Low-rank Adaptation (LoRA) was applied to perform lightweight parameter tuning on task-specific agents, enabling targeted adaptation for demand classification and key-factor extraction tasks. Built upon the above training process, a multi-agent collaborative framework was constructed, in which the manager agent was responsible for task scheduling and quality control, while task agents were designed to perform demand classification, key-factor extraction, and explanation generation, respectively. Through this division of labor and collaborative mechanism, the framework achieved efficient and structured analysis of agricultural user demands. [Results and Discussions] Experimental results demonstrate that the proposed Agri-NeedAgent achieved a demand classification accuracy of 84.6%, a key-factor extraction F₁-Score of 85.2%, a structured interface compliance rate of 94.2%, and an interpretability score of 90.2.These results showed clear improvements over traditional deep learning models such as Bidirectional Encoder Representations from Transformers (BERT) as well as general-purpose large language models (LLMs) without domain adaptation. The findings confirmed the critical role of domain knowledge injection, explicit task alignment, and multi-agent specialization in enhancing semantic understanding and structured analysis of agricultural texts. Ablation experiments further validated the effectiveness of each component. Removing domain pretraining or LoRA fine-tuning resulted in substantial performance degradation in classification and key-factor extraction, indicating the necessity of domain adaptation and task-specific optimization for handling non-standard agricultural expressions. Moreover, eliminating the manager agent or the Reasoning and Acting (ReAct) mechanism significantly reduced structured interface compliance and interpretability, highlighting the importance of task coordination, intermediate verification, and multi-step reasoning for ensuring logical consistency and output completeness. Additionally, removing the external knowledge base reduced the interpretability score from 90.2 to 77.6, underscoring its essential role in providing theoretical grounding, reasoning support, and professional explanations. Although the multi-agent collaboration introduced an additional inference overhead of approximately 140 ms, the overall per-sample inference time remained within 225 ms, meeting the real-time requirements of agricultural consultation scenarios. [Conclusions] Supported by a "three-stage training + multi-agent collaboration" framework, LLMs can effectively address challenges posed by non-standard expressions, semantic fragmentation, and multi-factor reasoning in agricultural user demand texts. The proposed method demonstrated significant improvements in demand classification, key-factor extraction, structured output compliance, and interpretability, providing high-quality and traceable structured data for intelligent agricultural decision-making. After domain adaptation and task-specific tuning, the model not only gains enhanced capability for deep semantic analysis of agricultural user demands but also ensures the completeness and interpretability of outputs through multi-agent coordination. Although the current workflow still requires optimization in terms of data preparation, staged training, and knowledge-base updating, future work will focus on expanding region-specific and emerging-technology-related demand data, developing a dynamically updated agricultural knowledge system, improving multi-agent coordination efficiency, and exploring cross-lingual agricultural demand analysis to further promote the application and deployment of agricultural large models across broader scenarios.