Loading...
Welcome to Smart Agriculture 中文

Table of Content

    30 September 2025, Volume 7 Issue 5
    Special Issue--Opto-Intelligent Agricultural Innovation Technology and Application
    Frontiers and Future Trends in Data Sensing Technologies for Opto-Intelligent Agriculture: From Optical Sensors to Intelligent Decision Systems |
    CHEN Chengcheng, WU Jiaping, YU Helong
    2025, 7(5):  1-16.  doi:10.12133/j.smartag.SA202507049
    Asbtract ( 46 )   HTML ( 3)   PDF (2483KB) ( 4 )  
    Figures and Tables | References | Related Articles | Metrics

    [Significance] Opto-intelligent agriculture represents an emerging paradigm that deeply integrates optical sensing and intelligent decision-making within agricultural systems, aiming to transform production from experience-based management to data-driven precision cultivation. The core of this paradigm lies in exploiting the dual role of light: As an information carrier, it enables non-destructive sensing of crop physiological states through hyperspectral imaging, fluorescence, and other optical sensors; As a regulatory factor, it allows feedback-based manipulation of the light environment to precisely regulate crop growth. This establishes a closed-loop framework of "perception-decision-execution", which substantially enhances water and fertilizer use efficiency, enables early warning of pests and diseases, and supports quality-oriented production. Nevertheless, the transition of this technology from laboratory research to large-scale field application remains challenged by unstable signals under complex environments, weak model generalization, high equipment costs, a shortage of interdisciplinary talent, and insufficient policy support and promotion mechanisms. This paper systematically reviews the technological architecture, practical achievements, and intrinsic limitations of opto-intelligent agriculture, with the objective of providing theoretical guidance and practical directions for future development. [Progress] Opto-intelligent agriculture is evolving from isolated technological advances toward full-chain integration, characterized by significant progress in optical sensing, intelligent decision-making, and precision execution. At the optical sensing level, technological approaches have expanded from traditional spectral imaging to multi-scale, synergistic sensing networks. Hyperspectral imaging captures subtle spectral variations during the early stages of crop stress, chlorophyll fluorescence imaging enables ultra-early diagnosis of both biotic and abiotic stresses, LiDAR provides accurate three-dimensional phenotypic data, and emerging quantum-dot sensors have enhanced detection sensitivity down to the molecular scale. In terms of intelligent decision-making, recent advances focus on the deep integration of mechanistic and data-driven models, which compensates for the limited adaptability of purely mechanistic models while improving the interpretability of purely data-based ones. Through multi-source data fusion, the system jointly analyzes optical, environmental, and soil parameters to generate globally optimal strategies that balance yield, quality, and resource efficiency. At the execution stage, systems have developed into real-time feedback control loops. Dynamic light-spectrum LED systems and intelligent variable-spray drones transform decision outputs into precise actions, while continuous monitoring enables adaptive self-optimization. This mature technological chain has delivered measurable outcomes across the agricultural value chain, integrated solutions demonstrate even greater potential. Collectively, the achievements signify the transition of opto-intelligent agriculture from conceptual exploration to practical implementation. [Conclusions and Prospects] By synergizing optical perception with intelligent decision-making, opto-intelligent agriculture is driving a fundamental transformation in agricultural production. To achieve the transition from merely usable to genuinely effective, a comprehensive advancement framework integrating technology, equipment, talent, and policy must be established. Technologically, efforts should focus on enhancing sensing stability under open-field conditions, developing lightweight and interpretable models, and promoting the domestic development of core components. From a talent perspective, interdisciplinary education and agricultural technology training must be strengthened. From a policy standpoint, improving subsidy mechanisms, digital infrastructure, and innovation-oriented dissemination systems will be essential. Looking forward, through integration with emerging technologies such as 6G communication and digital twin systems, opto-intelligent agriculture is poised to become a cornerstone for ensuring both food security and ecological sustainability.

    Progress and Prospects of Research on Key Technologies for Agricultural Multi-Robot Full Coverage Operations |
    LU Zaiwang, ZHANG Yucheng, MA Yike, DAI Feng, DONG Jie, WANG Peng, LU Huixian, LI Tongbin, ZHAO Kaibin
    2025, 7(5):  17-36.  doi:10.12133/j.smartag.SA202507040
    Asbtract ( 54 )   HTML ( 1)   PDF (4772KB) ( 5 )  
    Figures and Tables | References | Related Articles | Metrics

    [Significance] With the deepening of intelligent agriculture and precision agriculture, the agricultural production mode is gradually transforming from traditional manual experience based operations to a modern model driven by data, intelligent decision-making, and autonomous execution. In this context, improving agricultural operation efficiency and achieving large-scale continuous and seamless operation coverage have become key requirements for promoting the modernization of agriculture. The multi-robot full coverage operation technology, with its significant advantages in operation efficiency, system robustness, scalability, and resource utilization efficiency, provides practical and feasible intelligent solutions for key links such as sowing, plant protection, and harvesting in large-scale farmland. This technology, through the collaborative work of multi-robot systems, can not only effectively reduce the repetition rate of tasks and avoid omissions, but also achieve efficient and accurate continuous operations in complex and dynamic agricultural environments, greatly improving the automation and intelligence level of agricultural production. [Progress] Starting from the global perspective of systems engineering, an integrated closed-loop technology framework of "perception-decision-execution" is constructed. It systematically sorts out and deeply analyzes the technological development status and research methods of each key link in the full-coverage operations of agricultural multi robot. At the level of perception and recognition, it focus on exploring the application of multi-source information fusion and collaborative perception technology. By integrating multi-source sensor data, multi-level fusion of data level, feature level, and decision level is achieved, and a refined global environment model is constructed to provide accurate crop status, obstacle distribution, and terrain information for the robot system. Especially in the field of multi-robot collaborative perception, research has covered advanced models such as distributed simultaneous localization and mapping (SLAM) and ground to ground collaboration. Through information sharing and complementary perspectives, the system's perception ability and modeling accuracy in wide area, unstructured agricultural environments have been improved. At the decision-making and planning level, three key aspects are analyzed: task allocation, global path planning, and local path adjustment. Task allocation has evolved from traditional deterministic methods to market mechanisms, heuristic algorithms, and intelligent methods that integrate reinforcement learning and graph neural networks to address the challenges of dynamic and complex resource constraints in agricultural scenarios. The global path planning system analyzes the characteristics of geometric decomposition, grid method, global planning, and learning methods in terms of path redundancy, computational efficiency, and terrain adaptability. Local path planning emphasizes the combination of real-time perception in dynamic environments, using methods such as graph search, sampling optimization, model predictive control, and end-to-end reinforcement learning to achieve real-time obstacle avoidance and trajectory smoothing. At the control execution level, the focus is on model-based trajectory tracking and control technology, aiming to accurately convert planned paths into robot motion. Traditional control methods such as PID, LQR, sliding mode control, etc. are continuously optimized to cope with terrain undulations and system disturbances. In recent years, intelligent methods such as fuzzy control, neural network control, reinforcement learning, and multi machine collaborative strategies have been gradually applied, further improving the control accuracy and collaborative operation capability of the system in dynamic environments. [Conclusions and Prospects] The closed-loop technical framework is systematically constructed for agricultural multi-robot full coverage operations, and in-depth analysis of key modules is conducted, providing some understanding and suggestions, and providing theoretical references and technical paths for related research. However, the technology still faces many challenges, including perceptual uncertainty, dynamic changes in tasks, vast and irregular work areas, unpredictable dynamic obstacles, communication and collaboration barriers, and energy endurance issues. In the future, this field will further strengthen the integration with artificial intelligence, the Internet of Things, edge computing and other technologies, focusing on promoting the following directions, including the development of intelligent dynamic task allocation mechanism; optimize global and local path planning algorithms to enhance their efficiency and adaptability in large-scale complex scenarios; enhance the real-time perception and response capability of the system to dynamic environments; promote software hardware collaboration and intelligent system integration to achieve efficient communication and integrated task management; develop high-efficiency power systems and intelligent energy consumption strategies to ensure long-term continuous operation capability. Through these efforts, agricultural multi-robot systems will gradually achieve higher levels of precision, automation, and intelligence, providing key technological support for the transformation of modern agriculture.

    Applications Research Progress and Prospects of Multi-Agent Large Language Models in Agricultural |
    ZHAO Yingping, LIANG Jinming, CHEN Beizhang, DENG Xiaoling, ZHANG Yi, XIONG Zheng, PAN Ming, MENG Xiangbao
    2025, 7(5):  37-51.  doi:10.12133/j.smartag.SA202503026
    Asbtract ( 188 )   HTML ( 13)   PDF (1978KB) ( 19 )  
    Figures and Tables | References | Related Articles | Metrics

    [Significance] With the rapid advancement of large language models (LLM) and multi-agent systems, their integration, multi-agent large language models, is emerging as a transformative force in modern agriculture. Agricultural production involves complex, sequential, and highly environment-dependent processes, including tillage, planting, management, and harvesting. Traditional intelligent systems often struggle with the diversity, uncertainty, and coordination of these stages' demand. Multi-agent LLMs offer a new paradigm for agricultural intelligence by combining deep semantic understanding with distributed collaboration and adaptive coordination. Through role specialization, real-time perception, and cooperative decision-making, they can decompose complex workflows, adapt to changing conditions, and enable robust, full-process automation, making them well-suited to the challenges of modern agriculture. More importantly, their application marks a critical step toward the digital transformation, precision management, and sustainable development of agriculture. By enabling intelligent decision-making across the entire agricultural lifecycle, they provide both theoretical foundations and practical tools for building next-generation smart and unmanned farming systems. [Progress] The core concepts of multi-agent LLMs are first elucidated, covering the composition and characteristics of multi-agent systems as well as the development and training pipelines of LLMs. Then, the overall architecture of multi-agent systems is presented, encompassing both the environments in which agents operate and their internal structures. The collaborative patterns of multi-agent LLMs are then examined in terms of coordination structures and temporal organization. Following this, interaction mechanisms are discussed from multiple dimensions, including interactions between agents and the external environment, inter-agent communication, communication protocol frameworks, and communication security. To demonstrate the varying task specializations of different multi-agent frameworks, a comparative benchmark survey table is provided by synthesizing benchmark tasks and results reported in existing studies. The results show that different multi-agent large language model architectures tend to perform better on specific types of tasks, reflecting the influence of agent framework design characteristics such as role assignment strategies, communication protocols, and decision-making mechanisms. Furthermore, several representative architectures of multi-agent LLMs, as proposed in existing studies, are briefly reviewed. Based on their design features, their potential applicability to agricultural scenarios is discussed. Finally, current research progress and practical applications of LLMs, multimodal large models, and multi-agent LLMs in the agricultural domain are surveyed. The application architecture of agricultural LLMs is summarized, using rice cultivation as a representative scenario to illustrate the collaborative process of a multi-agent system powered by LLMs. This process involves data acquisition agents, data processing agents, task allocation and coordination agents, task execution agents, and feedback and optimization agents. The roles and functions of each kind of agent in enabling automated and intelligent operations throughout the entire agricultural lifecycle, including tillage, planting, management, and harvesting, are comprehensively described. In addition, drawing on existing research on multimodal data processing, the pseudocode is provided to illustrate the basic logic of the data processing agents. [Conclusions and Prospects] Multi-agent LLMs technology holds vast promise in agriculture but still confronts several challenges. First, limited model interpretability, stemming from opaque internal reasoning and high-dimensional parameter mappings, hinders decision transparency, traceability, user trust, and debugging efficiency. Second, model hallucination is significant, probabilistic generation may deviate from facts, leading to erroneous environmental perception and decisions that cause resource waste or crop damage. Third, multi-modal agricultural data acquisition and processing remain complex due to non-uniform equipment standards, heterogeneous data, and insufficient cross-modal reasoning, complicating data fusion and decision-making. Future directions include: (1) enhancing interpretability via chain-of-thought techniques to improve reasoning transparency and traceability; (2) reducing hallucinations by integrating knowledge bases, retrieval-augmented generation, and verification mechanisms to bolster decision reliability; and (3) standardizing data formats to strengthen cross-modal fusion and reasoning. These measures will improve system stability and efficiency, providing solid support for the advancement of smart agriculture.

    Research Advances in Hyperspectral Imaging Technology for Fruit Quality Assessment |
    ZHANG Zishen, CHENG Hong, GENG Wenjuan, GUAN Junfeng
    2025, 7(5):  52-66.  doi:10.12133/j.smartag.SA202507020
    Asbtract ( 55 )   HTML ( 4)   PDF (3039KB) ( 2 )  
    Figures and Tables | References | Related Articles | Metrics

    [Significance] Hyperspectral imaging (HSI) is an advanced sensing technique that simultaneously acquires high-resolution spatial data and continuous spectral information, enabling non-destructive, real-time evaluation of both external and internal fruit quality attributes. Despite its widespread application in agricultural product assessment, comprehensive reviews specifically addressing fruit quality evaluation using HSI are limited. This paper presents a comprehensive review of recent advancements in the application of HSI technology for fruit quality detection. [Progress] This paper provides a comprehensive review from three key dimensions: scenario adaptability, technological evolution trends, and industrial implementation bottlenecks, with a further analysis of the research outlook in HSI applications for fruit quality assessment. Specifically, by employing non-destructive and rapid spectral imaging techniques, HSI has markedly enhanced the accuracy of assessing various quality parameters, including external appearance, surface defects, internal quality (such as sugar content, acidity, and moisture), and ripeness. Furthermore, significant progress has been achieved in utilizing HSI for disease detection, variety classification, and origin traceability, thereby providing robust technical support for fruit quality control and supply chain management. In addition, bibliometric analysis is utilized to identify key research areas and emerging trends in the application of HSI technology for fruit quality assessment. [Conclusions and Prospects] Future research should focus on optimizing spectral dimensionality reduction techniques to enhance both the efficiency and accuracy of models. Transfer learning and incremental learning approaches should also be explored to improve the models' ability to generalize across various scenarios and fruit types. In parallel, developing lightweight system hardware and strengthening edge processing capabilities will be essential for enabling the practical deployment of HSI technology in real-world applications. Integrating lightweight deep learning networks and acceleration modules will support real-time inference, enhancing processing speed and facilitating faster data analysis. It is also crucial to establish standardized systems and protocols to promote the sharing of research findings and ensure broader application across different industries. Additionally, incorporating multimodal technologies, such as thermal imaging, gas sensors, and visual data, will improve the accuracy and robustness of detection platforms. This integration will allow for more precise and comprehensive assessments of fruit quality, further advancing the digitalization and intelligent application of HSI technology.

    Detection Method for Log-Cultivated Shiitake Mushrooms Based on Improved RT-DETR |
    WANG Fengyun, WANG Xuanyu, AN Lei, FENG Wenjie
    2025, 7(5):  67-77.  doi:10.12133/j.smartag.SA202506034
    Asbtract ( 89 )   HTML ( 3)   PDF (1809KB) ( 12 )  
    Figures and Tables | References | Related Articles | Metrics

    [Objective] Shiitake mushroom is one of the most important edible and medicinal fungi in China, and its factory-based cultivation has become a major production model. Although mixing, bagging, sterilization, and inoculation have been largely automated, harvesting and grading still depend heavily on manual labor, which leads to high labor intensity, low efficiency, and inconsistency caused by subjective judgment, thereby restricting large-scale production. Furthermore, the clustered growth pattern of shiitake mushrooms, the high proportion of small targets, severe occlusion, and complex illumination conditions present additional challenges to automated detection. Traditional object detection models often struggle to balance accuracy, robustness, and lightweight efficiency in such environments. Therefore, there is an urgent need for a high-precision and lightweight detection model capable of supporting intelligent evaluation in mushroom harvesting. [Methods] To address these challenges, this study proposed an improved real-time detection model named FSE-DETR, based on the RT-DETR framework. In the backbone, the FasterNet Block was introduced to replace the original HGNetv2 structure. By combining partial convolution (PConv) for efficient channel reduction and pointwise convolution (PWConv) for rapid feature integration, the FasterNet Block reduced redundant computation and parameter size while maintaining effective multi-scale feature extraction, thereby improving both efficiency and deployment feasibility. In the encoder, a small object feature fusion network (SFFN) was designed to enhance the recognition of immature mushrooms and other small targets. This network first applied space-to-depth convolution (SPDConv), which rearranged spatial information into channel dimensions without discarding fine-grained details such as edges and textures. The processed features were then passed through the cross stage partial omni-kernel (CSPOmniKernel) module, which divided feature maps into two parts: one path preserved original information, while the other path underwent multi-scale convolutional operations including 1×1, asymmetric large-kernel, and frequency-domain transformations, before being recombined. This design enabled the model to capture both local structural cues and global semantic context simultaneously, improving its robustness under occlusion and scale variation. For bounding box regression, the Efficient Intersection over Union (EIoU) loss function was adopted to replace generalized IoU (GIoU). Unlike GIoU, EIoU explicitly penalized differences in center distance, aspect ratio, and scale between predicted and ground-truth boxes, resulting in more precise localization and faster convergence during training. The dataset was constructed from images collected in mushroom cultivation facilities using fixed-position RGB cameras under diverse illumination conditions, including direct daylight, low-light, and artificial lighting, to ensure realistic coverage. Four mushroom categories were annotated: immature mushrooms, flower mushrooms, smooth cap mushrooms, and defective mushrooms, following industrial grading standards. To address the limited size of raw data and prevent overfitting, extensive augmentation strategies such as horizontal and vertical flipping, random rotation, Gaussian and salt-and-pepper noise addition, and synthetic occlusion were applied. The augmented dataset consisted of 4 000 images, which were randomly divided into training, validation, and test sets at a ratio of 7:2:1, ensuring balanced distribution across all categories. [Results and Discussions] Experimental evaluation was conducted under consistent hardware and hyperparameter settings. The ablation study revealed that FasterNet effectively reduced parameters and computation while slightly improving accuracy, SFFN significantly enhanced the detection of small and occluded mushrooms, and EIoU improved bounding box regression. When integrated, these improvements enabled the final model to achieve an accuracy of 95.8%, a recall of 93.1%, and a mAP50 of 95.3%, with a model size of 19.1 M and a computational cost of 53.6 GFLOPs, thus achieving a favorable balance between precision and efficiency. Compared with mainstream detection models including Faster R-CNN, YOLOv7, YOLOv8m, and YOLOv12m, FSE-DETR consistently outperformed them in terms of accuracy, robustness, and model efficiency. Notably, the mAP for immature and defective mushrooms increased by 2.4 and 2.5 percentage points, respectively, compared with the baseline RT-DETR, demonstrating the effectiveness of the SFFN module for small-object detection. Visualization analysis further confirmed that FSE-DETR maintained stable detection performance under different illumination and occlusion conditions, effectively reducing missed detections, false positives, and repeated recognition, while other models exhibited noticeable deficiencies. These results verified the superior robustness and reliability of the proposed model in practical mushroom factory environments. [Conclusions] The proposed FSE-DETR model integrated the FasterNet Block, Small Object Feature Fusion Network, and EIoU loss into the RT-DETR framework, achieving state-of-the-art accuracy while maintaining lightweight characteristics. The model showed strong adaptability to small targets, occlusion, and complex illumination, making it a reliable solution for intelligent mushroom harvest evaluation. With its balance of precision and efficiency, FSE-DETR demonstrates great potential for deployment in real-world factory production and provides a valuable reference for developing high-performance, lightweight detection models for other agricultural applications.

    Vegetable Price Prediction Based on Optimized Neural Network Time Series Models |
    HOU Ying, SUN Tan, CUI Yunpeng, WANG Xiaodong, ZHAO Anping, WANG Ting, WANG Zengfei, YANG Weijia, GU Gang, WU Shaodong
    2025, 7(5):  78-87.  doi:10.12133/j.smartag.SA202410037
    Asbtract ( 277 )   HTML ( 9)   PDF (1075KB) ( 112 )  
    Figures and Tables | References | Related Articles | Metrics

    [Objective] The price volatility of vegetables has profound implications for both farmers and consumers. Fluctuating prices directly impact farmers' earnings and pose challenges to market stability and consumer purchasing behaviors. These fluctuations are driven by a multitude of complex and interrelated factors, including supply and demand, seasonal cycles, climatic conditions, logistical efficiency, government policies, consumer preferences, and suppliers' trading strategies. As a result, vegetable prices tend to exhibit nonlinear and non-stationary patterns, which significantly complicate efforts to produce accurate price forecasts. Addressing these forecasting challenges holds considerable practical and theoretical value, as improved prediction models can support more stable agricultural markets, secure farmers' incomes, reduce cost-of-living volatility for consumers, and inform more precise and effective government regulatory strategies. [Methods] The study investigated the application of neural network-based time series forecasting models for the prediction of vegetable prices. In particular, a selection of state-of-the-art neural network architectures was evaluated for their effectiveness in modeling the complex dynamics of vegetable pricing. The selected models for the research included PatchTST and iTransformer, both of which were built upon the Transformer architecture, as well as SOFTS and TiDE, which leveraged multi-layer perceptron (MLP) structures. In addition, Time-LLM, a model based on a large language model architecture, was incorporated to assess its adaptability to temporal data characterized by irregularity and noise. To enhance the predictive performance and robustness of these models, an automatic hyperparameter optimization algorithm was employed. This algorithm systematically adjusted key hyperparameters such as learning rate, batch size, early stopping, and random seed. It utilized probabilistic modeling techniques to construct performance-informed distributions for guiding the selection of more effective hyperparameter configurations. Through iterative updates informed by prior evaluation data, the optimization algorithm increased the search efficiency in high-dimensional parameter spaces, while simultaneously minimizing computational costs. The training and validation process allocated 80% of the data to the training set and 20% to the validation set, and employed the mean absolute error (MAE) as the primary loss function. In addition to the neural network models, the study incorporated a traditional statistical model, the autoregressive integrated moving average (ARIMA), as a baseline model for performance comparison. The predictive accuracy of all models was assessed using three widely recognized error metrics: MAE, mean absolute percentage error (MAPE), and mean squared error (MSE). The model that achieved the most favorable performance across these metrics was selected for final vegetable price forecasting. [Results and Discussions] The experimental design of the study focused on four high-demand, commonly consumed vegetables: carrots, white radishes, eggplants, and iceberg lettuce. Both daily and weekly price forecasting tasks were conducted for each type of vegetable. The empirical results demonstrated that the neural network-based time series models provided strong fitting capabilities and produced accurate forecasts for vegetable prices. The integration of automatic hyperparameter tuning significantly improved the performance of these models. In particular, after tuning, the MSE for daily price prediction decreased by at least 76.3% for carrots, 94.7% for white radishes, and 74.8% for eggplants. Similarly, for weekly price predictions, the MSE reductions were at least 85.6%, 93.6%, and 64.0%, respectively, for the same three vegetables. These findings confirm the substantial contribution of the hyperparameter optimization process to enhancing model effectiveness. Further analysis revealed that neural network models performed better on vegetables with relatively stable price trends, indicating that the underlying consistency in data patterns benefited predictive modeling. On the other hand, Time-LLM exhibited stronger performance in weekly price forecasts involving more erratic and volatile price movements. Its robustness in handling time series data with high degrees of randomness suggests that model architecture selection should be closely aligned with the specific characteristics of the target data. Ultimately, the study identified the best-performing model for each vegetable and each prediction frequency. The results demonstrated the generalizability of the proposed approach, as well as its effectiveness across diverse datasets. By aligning model architecture with data attributes and integrating targeted hyperparameter optimization, the research achieved reliable and accurate forecasts. [Conclusions] The study verified the utility of neural network-based time series models for forecasting vegetable prices. The integration of automatic hyperparameter optimization techniques notably improved predictive accuracy, thereby enhancing the practical utility of these models in real-world agricultural settings. The findings provide technical support for intelligent agricultural price forecasting and serve as a methodological reference for predicting prices of other agricultural commodities. Future research may further improve model performance by integrating multi-source heterogeneous data. In addition, the application potential of more advanced deep learning models can be further explored in the field of price prediction.

    Chinese Tea Pest and Disease Named Entity Recognition Method Based on Improved Boundary Offset Prediction Network |
    XIE Yuxin, WEI Jiangshu, ZHANG Yao, LI Fang
    2025, 7(5):  88-100.  doi:10.12133/j.smartag.SA202505007
    Asbtract ( 84 )   HTML ( 4)   PDF (2716KB) ( 7 )  
    Figures and Tables | References | Related Articles | Metrics

    [Objective] Named entity recognition (NER) is vital for many natural language processing (NLP) applications, including information retrieval and knowledge graph construction. While Chinese NER has advanced with datasets like ResumeNER, WeiboNER, and CLUENER (Chinese language understanding evaluation NER), most focus on general domains such as news or social media. However, there is a notable lack of annotated data in specialized fields, particularly agriculture. In the context of tea pest and disease, this shortage hampers progress in intelligent agricultural information extraction. These domain-specific texts pose unique challenges for NER due to frequent nested and long-span entities, which traditional sequence labeling models struggle to handle. Issues such as boundary ambiguity further complicate accurate entity recognition, leading to poor segmentation and labeling performance. Addressing these challenges requires targeted datasets and improved NER techniques tailored to the agricultural domain. [Methods] The proposed model comprises two core modules specifically designed to enhance performance in BOPN (Boundary-Oriented and Path-aware Named Entity Recognition) tasks, particularly within domains characterized by complex and fine-grained entity structures, such as tea pest and disease recognition. The boundary prediction module was responsible for identifying entity spans within input text sequences. It employed an attention-based mechanism to dynamically estimate the probability that consecutive tokens belong to the same entity, thereby addressing the challenge of boundary ambiguity. This mechanism facilitated more accurate detection of entity boundaries, which was particularly critical in scenarios involving nested or overlapping entities. The label enhancement module further refines entity recognition by employing a biaffine classifier that jointly models entity spans and their corresponding category labels. This joint modeling approach enabled the capture of intricate interactions between span representations and semantic label information, improving the identification of long or syntactically complex entities. The output of this module was integrated with conditionally normalized hidden representations, enhancing the model's capacity to assign context-aware and semantically precise labels. In order to reduce computational complexity while preserving model effectiveness, the architecture incorporated low-rank linear layers. These were constructed by integrating the adaptive channel weighting mechanism of Squeeze-and-Excitation Networks with low-rank decomposition techniques. The modified layers replace traditional linear transformations, yielding improvements in both efficiency and representational capacity. In addition to model development, a domain-specific NER corpus was constructed through the systematic collection and annotation of entity information related to tea pest and disease from scientific literature, agricultural technical reports, and online texts. The annotated entities in the corpus were categorized into ten classes, including tea plant diseases, tea pests, disease symptoms, and pest symptoms. Based on this labeled corpus, a Chinese NER dataset focused on tea pest and disease was developed, referred to as the Chinese tea pest and disease dataset. [Results and Discussions] Extensive experiments were conducted on the constructed dataset, comparing the proposed method with several mainstream NER approaches, including traditional sequence labeling models (e.g., BiLSTM-CRF), lexicon-enhanced models (e.g., SoftLexicon), and boundary smoothing strategies (e.g., Boundary Smooth). These comparisons aimed to rigorously assess the effectiveness of the proposed architecture in handling domain-specific and structurally complex entity types. Additionally, to evaluate the model's generalization capability beyond the tea disease and pest domain, the study performed comprehensive evaluations on four publicly available Chinese NER benchmark datasets: ResumeNER, WeiboNER, CLUENER, and Taobao. Results showed that the proposed model consistently achieved higher F1-Scores improved across all used datasets: 0.68% on the self-built dataset, 0.29% on ResumeNER, 0.96% on WeiboNER, 0.7% on CLUENER, and 0.5% on Taobao. With particularly notable improvements in the recognition of complex, nested, and long-span entities. These outcomes demonstrate the model's superior capacity for capturing intricate entity boundaries and semantics, and confirm its robustness and adaptability when compared to current state-of-the-art methods. [Conclusions] The study presents a high-performance NER approach tailored to the characteristics of Chinese texts on tea pest and disease. By simultaneously optimizing entity boundary detection and label classification, the proposed method significantly enhanced recognition accuracy in specialized domains. Experimental results demonstrated strong adaptability and robustness of the model across both newly constructed and publicly available datasets, indicating its broad applicability and promising prospects.

    Imbalanced Hyperspectral Viability Detection of Naturally Aged Soybean Germplasm Based on Semi-Supervised Deep Convolutional Generative Adversarial Network |
    LI Fei, WANG Ziqiang, WU Jing, XIN Xia, LI Chunmei, XU Hubo
    2025, 7(5):  101-113.  doi:10.12133/j.smartag.SA202505013
    Asbtract ( 134 )   HTML ( 4)   PDF (2679KB) ( 12 )  
    Figures and Tables | References | Related Articles | Metrics

    [Objective] Germplasm resources are regarded as the "chips" of high-quality breeding, and evaluating the viability of soybean germplasm is essential for ensuring the secure preservation of genetic resources and promoting the healthy development of the soybean industry. Traditional viability detection methods are time-consuming, labor-intensive, and seed-consuming, highlighting the urgent need for non-destructive, intelligent, and high-throughput detection technologies. Hyperspectral imaging combined with deep learning offers a promising approach for the rapid, non-destructive assessment of soybean germplasm viability. Compared to artificially aged samples, naturally aged samples more accurately reflect the substance changes associated with the decline in germplasm viability. However, the imbalance in the number of viable and non-viable samples limits the generalization performance of viability prediction models. [Methods] In order to address the aforementioned challenges, a semi-supervised deep convolutional generative adversarial network (SDCGAN) was proposed in this research to generate high-quality hyperspectral data with associated viability labels. The SDCGAN framework consisted of three main components: a generator, a discriminator, and a classifier. The generator progressively transformed low-dimensional latent representations into hyperspectral data. This was achieved through four one-dimensional transposed convolutional layers, ensuring the output matched the dimensionality of real spectra. The discriminator adopted an optimization strategy based on the wasserstein distance, replacing the Jensen-Shannon divergence used in traditional GANs, thereby mitigating training instability and gradient vanishing. Additionally, a gradient penalty term was introduced to further stabilize model training. In the classifier, a unilateral margin loss function was employed to penalize only those samples near the decision boundary, effectively avoiding overfitting on well-separated samples and improving training efficiency. Furthermore, a spectral score fusion network (SSFNet) was developed to enable hyperspectral-based detection of soybean seed viability. SSFNet comprised two core modules: a spectral residual network and a spectral score fusion module. The spectral residual network extracted shallow-level features from the hyperspectral data, capturing local patterns within spectral sequences. The spectral score fusion module adaptively reweighted spectral channels to emphasize viability-related features and suppress redundant noise. Finally, the performance of the SDCGAN-generated spectra was evaluated using root mean square error (RMSE), while the viability detection performance of SSFNet was assessed using test accuracy, precision, area under the curve (AUC), and F1-Score. [Results and Discussions] In the performance analysis of SDCGAN, the model progressively learned and captured the key spectral features that distinguished viable and non-viable soybean seeds during the training process. The generated spectra gradually evolved from initial noisy fluctuations to smoother curves that closely resembled real spectra, demonstrating strong nonlinear modeling capability. Compared to other generative adversarial models, SDCGAN achieved the best performance in enhancing viability detection, and its generated data exhibited low error characteristics in RMSE analysis. By applying SDCGAN for data augmentation, three types of datasets were constructed: original spectra, generated spectra, and mixed spectral dataset. When using the multiple scatter correction-savitzky-golay-standardscaler (MSC-SG-SS) preprocessing strategy, SSFNet achieved the highest viability detection accuracies across all three datasets, reaching 89.50%, 90.83%, and 93.33%, respectively. In comparison with other viability detection models, SSFNet consistently outperformed alternative algorithms in all four evaluation metrics across all datasets. Particularly on the mixed dataset, SSFNet demonstrated the best performance, achieving a test accuracy of 93.33%, precision of 95.17%, AUC of 92.58%, and F1-Score of 94.83%. Notably, all models trained on the mixed dataset containing SDCGAN-generated samples achieved better performance than those trained on either original or generated datasets alone. This improvement was likely due to the increased sample diversity and balanced class distribution in the mixed dataset, which provided more comprehensive viability-related features, facilitated model convergence, and reduced overfitting. In transfer experiments, SSFNet also exhibited superior generalization capability compared to four baseline algorithms: support vector machine (SVM ), extreme gradient boosting (XGBoost), one-dimensional convolutional neural network (1D-CNN), and Transformer, achieving the highest classification accuracy of 73.67% on the mixed dataset. [Conclusions] This research constructs an integrated SDCGAN-SSFNet framework for robust viability detection of naturally aged soybean germplasm under imbalanced sample conditions. The SDCGAN component accurately learns the underlying distributional characteristics of real hyperspectral data from soybean seeds and generates realistic synthetic samples, effectively augmenting the spectral data of non-viable seeds and improving data diversity. Meanwhile, SSFNet explores inter-band spectral correlations to adaptively enhance features that are highly relevant to viability classification while effectively suppressing redundant and noisy information. This integrated approach enables rapid, nondestructive, and high-precision detection of soybean seed viability under challenging sample imbalance scenarios, providing an efficient and reliable method for seed quality assessment and agricultural decision-making.

    Lightweight Apple Instance Segmentation Algorithm Based on SSW-YOLOv11n for Complex Orchard Environments |
    HAN Wenkai, LI Tao, FENG Qingchun, CHEN Liping
    2025, 7(5):  114-123.  doi:10.12133/j.smartag.SA202505002
    Asbtract ( 98 )   HTML ( 11)   PDF (3074KB) ( 10 )  
    Figures and Tables | References | Related Articles | Metrics

    [Objective] In complex orchard environments, accurate fruit detection and segmentation are critical for autonomous apple-picking robots. Environmental factors severely degrade fruit visibility, challenging instance segmentation models across diverse field conditions. Apple-picking robots operate on embedded edge-computing platforms with stringent constraints on processing power, memory, and energy consumption. Limited computational resources preclude high-complexity deep-learning architectures, requiring segmentation models to balance real-time throughput and resource efficiency. This study introduces SSW-YOLOv11n, a lightweight instance segmentation model derived from YOLOv11n and tailored to orchard environments. SSW-YOLOv11n maintains high mask accuracy under adverse conditions—variable lighting, irregular occlusion, and background clutter—while delivering accelerated inference on resource-limited edge devices through three core design enhancements. [Methods] The SSW-YOLOv11n model first introduced GSConv and VoVGSCSP modules into its neck network, thereby constructing a highly compact yet computationally efficient "Slim-Neck" architecture. By integrating GSConv—an operation that employs grouped spatial convolutions and channel-shuffle techniques—and VoVGSCSP—a cross-stage partial module optimized for balanced depth and width—the model substantially reduced its overall floating-point operations while concurrently enhancing the richness of its feature representations. This optimized neck design facilitated more effective multi-scale information fusion, ensuring that semantic features corresponding to target regions were extracted comprehensively, all without compromising the model's lightweight nature. Subsequently, the authors embedded the SimAM self-attention mechanism at multiple output interfaces between the backbone and neck subnets. SimAM leveraged a parameter-free energy-based weighting strategy to dynamically amplify critical feature responses and suppress irrelevant background activations, thereby augmenting the model's sensitivity to fruit targets amid complex, cluttered orchard scenes. Finally, the original bounding-box regression loss was replaced with Wise-IoU, which incorporated a dynamic weighting scheme based on both center-point distance and geometric discrepancy factors. This modification further refined the regression process, improving localization precision and stability under variable environmental conditions. Collectively, these three innovations synergistically endowed the model with superior instance-segmentation performance and deployment adaptability, offering a transferable design paradigm for implementing deep-learning-based vision systems on resource-constrained agricultural robots. [Results and Discussions] Experimental results demonstrated that SSW-YOLOv11n achieved Box mAP50 and Mask mAP50 of 76.3% and 76.7%, respectively, representing improvements of 1.7 and 2.4 percentage points over the baseline YOLOv11n model. The proposed model reduced computational complexity from 10.4 to 9.1 GFLOPs (12.5% reduction) and achieved a model weight of 4.55 MB compared to 5.89 MB for the baseline (22.8% reduction), demonstrating significant efficiency gains. These results indicate that the synergistic integration of lightweight architecture design and attention mechanisms effectively addresses the trade-off between model complexity and segmentation accuracy. Comparative experiments showed that SSW-YOLOv11n outperformed Mask R-CNN, SOLO, YOLACT, and YOLOv11n with Mask mAP50 improvements of 23.2, 20.3, 21.4, and 2.4 percentage points, respectively, evidencing substantial advantages in segmentation precision within unstructured orchard environments. The superior performance over traditional methods suggests that the proposed approach successfully adapts deep learning architectures to agricultural scenarios with complex environmental conditions. Edge deployment testing on NVIDIA Jetson TX2 platform achieved 29.8 FPS inference rate, representing an 18.7% improvement over YOLOv11n (25.1 FPS), validating the model's real-time performance and suitability for resource-constrained agricultural robotics applications. [Conclusions] SSW-YOLOv11n effectively enhanced fruit-target segmentation accuracy while reducing computational overhead, thus providing a robust technical foundation for the practical application of autonomous apple-picking robots. By addressing the dual imperatives of high-precision perception and efficient inference within constrained hardware contexts, the proposed approach advanced the state of the art in intelligent agricultural robotics and offered a scalable solution for large-scale orchard automation.

    Small Target Detection Method of Maize Leaf Disease Based on DCC-YOLOv10n |
    DANG Shanshan, QIAO Shicheng, BAI Mingyu, ZHANG Mingyue, ZHAO Chenyu, PAN Chunyu, WANG Guochen
    2025, 7(5):  124-135.  doi:10.12133/j.smartag.SA202504017
    Asbtract ( 56 )   HTML ( 6)   PDF (2056KB) ( 2 )  
    Figures and Tables | References | Related Articles | Metrics

    [Objective] Precise detection of maize leaf diseases plays a pivotal role in safeguarding maize yields and promoting sustainable agricultural development. However, existing detection algorithms often fall short in effectively capturing the intricate morphological details and shape characteristics of disease spots, particularly under challenging scenarios involving small disease targets. To overcome these challenges, a novel maize leaf disease detection algorithm, DCC-YOLOv10n, is presented in this research, which is specifically optimized for scenarios involving small-scale disease targets. [Methods] The core of the proposed method lay in three innovative architectural enhancements to the YOLOv10n detection framework. Firstly, a DRPAKConv module was designed, which built upon the arbitrary kernel convolution (AKConv). DRPAKConv replaced the conventional 3×3 convolutions that typically occupied a large proportion of the model's parameters. It featured two parallel branches: A dynamic sampling branch that adjusted the sampling shapes based on the spatial distribution of disease patterns, and a static convolution branch that adapted kernel sizes to retain spatial coverage and consistency. This design significantly enhanced the network's capability to recognize small-scale disease spots by dynamically modulating the receptive field and focusing on localized lesion details. Secondly, an improved feature fusion part was introduced by replacing the traditional C2f feature fusion module with a novel CBVoVGSCSP module. This redesigned module aimed to address the issue of gradient vanishing in deep feature fusion networks while reducing computational redundancy. CBVoVGSCSP preserved rich semantic information and improved the continuity of gradient flow across layers, which was critical for training deeper models. Furthermore, it enhanced multi-scale feature fusion and improved detection sensitivity for lesions of varying sizes and appearances. Thirdly, the convolutional attention-based feature map (CAFM) was incorporated into the neck network. This component enabled the model to effectively capture contextual relationships across multiple scales and enhanced the interaction between spatial and channel attention mechanisms. By selectively emphasizing or suppressing features based on their relevance to disease identification, the module allowed the model to more accurately distinguish between diseased and healthy regions. As a result, the model's representational capacity was improved, leading to enhanced detection accuracy in complex field environments. [Results and Discussions] Extensive experiments were conducted on a specialized maize leaf disease data set, which included annotated samples across multiple disease categories with diverse visual characteristics. Through ablation experiments and comparisons with different algorithms, it had been found that the DCC-YOLOv10n algorithm had exhibited good detection accuracy on the maize leaf disease dataset. Compared with YOLOv10n, the optimized algorithm demonstrated a reduction in computational complexity by 0.5 GFLOPs, with the model parameters compressed to merely 2.99 M. Significant improvements were observed in precision, recall, and mean average precision, which increased by 1.7, 2.6, and 1.7 percentage points respectively, and achieved 96.2%, 90.3%, and 94.1%. Based on the precision-recall curve comparison, the DCC-YOLOv10n algorithm had achieved more stable overall performance, with the mean average precision improved from 92.4% to 94.1% (an increase of 1.7 percentage points), which had fulfilled the detection requirements for small targets of maize leaf diseases. The findings underscored the robustness and adaptability of the DCC-YOLOv10n algorithm under challenging conditions. [Conclusions] The DCC-YOLOv10n algorithm presents a significant advancement in the field of agricultural disease diagnostics by addressing the limitations of existing methods with respect to small-target detection. The novel architectural components—DRPAKConv, CBVoVGSCSP, and CAFM integrated with attention fusion—not only significantly enhance the model's detection performance, but also advance the development of intelligent, data-efficient, and highly accurate disease monitoring systems tailored for modern agricultural applications. This research would serve as a valuable reference for future developments in lightweight, efficient, and accurate maize disease detection models, and offer practical significance for intelligent maize management.

    Multi-Objective Planting Planning Method Based on Connected Components and Genetic Algorithm: A Case Study of Fujin City |
    XU Menghua, WANG Xiujuan, LENG Pei, ZHANG Mengmeng, WANG Haoyu, HUA Jing, KANG Mengzhen
    2025, 7(5):  136-145.  doi:10.12133/j.smartag.SA202504012
    Asbtract ( 126 )   HTML ( 5)   PDF (1608KB) ( 5 )  
    Figures and Tables | References | Related Articles | Metrics

    [Objective] In the advancement of intensive agriculture, the contradiction between soil degradation and the demand for large-scale production has become increasingly pronounced, particularly in the core region of black soil in Northeast China. Long-term single-cropping patterns have caused soil structure damage and nutrient imbalance, severely threatening agricultural sustainability. Intensive rice cultivation has led to significant soil degradation, while the city must also balance national soybean planting mandates with large-scale production efficiency. However, existing planting planning methods predominantly focus on area optimization at the regional scale, lacking fine-grained characterization of plot-level spatial distribution, which easily results in fragmented layouts. Against this backdrop, a plot-scale multi-objective planting planning approach is developed to synergistically optimize contiguous crop distribution, soil restoration, practical production, and economic benefits, while ensuring national soybean planting tasks. This approach bridges macro-policy guidance and micro-production practices, providing scientific decision support for planting structure optimization and high-standard farmland construction in major grain-producing areas of Northeast China. [Methods] The multi-objective optimization model was established within a genetic algorithm framework, integrating connected component analysis to address plot-level spatial layout challenges. The model incorporated five indicators: economic benefit, soybean planting area, contiguous planting, crop rotation benefits, and the number of paddy-dryland conversions. The economic benefit objective was quantified by calculating the total income of crop combinations across all plots. A rigid threshold for soybean planting area was set to fulfill national mandates. The contiguous planting was evaluated using a connected-component-based method. The crop rotation benefits were scored according to predefined rotation rules. The paddy-dryland conversions were determined by counting changes in plot attributes. The model employed linear weighted summation to transform multi-objectives into a single objective for solution, generated high-quality initial populations via Latin Hypercube Sampling, and enhanced algorithm performance through connected-component-based crossover strategies and hybrid mutation strategies. Specifically, the crossover strategy was constructed based on connected component analysis: Adjacent plots with the same crop were divided into connected regions, and partial regions were randomly selected for crop gene exchange between parent generations, ensuring that the offspring inherited spatial coherence from parents, avoiding layout fragmentation caused by traditional crossover, and improving the rationality of contiguous planting. The mutation strategies included three types: Soybean threshold guarantee, plot-based crop rotation rule adaptation, and connected components-based crop rotation rule adaptation, which synergistically ensured mutation diversity and policy objective adaptability. Taking the Fujin city, Heilongjiang province—a crucial national commercial grain base—as an example, optimization was implemented using the distributed evolutionary algorithms in python (DEAP) library and validated through the simulation results of the four-year planting plan from 2020 to 2023. [Results and Discussions] Four years of simulation results demonstrated significant multi-objective balance in the optimized scheme. The contiguity index increased sharply from 0.477 in 2019 to 0.896 in 2020 and stabilized above 0.9 in subsequent years, effectively alleviating plot fragmentation and enhancing the feasibility of large-scale production. The economic benefits remained dynamically stable without significant decline, verifying the model's effectiveness in safeguarding economic efficiency. The soybean planting area stably met national thresholds while achieving strategic expansion, strengthening food security. The simulation results of crop rotation benefits reached 0.998 in 2023, indicating effective promotion of scientific rotation patterns and enhanced soil health and sustainable production capacity. The optimization objective of minimizing paddy-dryland conversions took practical production factors into account, achieving a good balance with crop rotation benefits and reflecting effective consideration of real-world production constraints. The evolutionary convergence curve showed the algorithm converged near the optimal solution, validating its convergence stability for this problem. In comparative experiments, this method outperformed traditional plot-based strategies in all optimization indicators except soybean planting area. Compared with the nondominated sorting genetic algorithm-Ⅱ (NSGA-II) multi-objective algorithm, it showed significant advantages in contiguous planting and crop rotation benefits. Although minor gaps existed in economic benefits and paddy-dryland conversions compared to NSGA-II, the planting layout was more regular and less fragmented. [Conclusions] The multi-objective planting planning method based on connected components and genetic algorithms proposed in this study bridges macro policies and micro layouts, effectively balancing black soil protection and production benefits through intelligent algorithms. By embedding spatial topology constraints into genetic operations, it solves the fragmentation problem in traditional methods while adapting to policy-driven planting scenarios via single-objective weighting strategies. Four years of simulations and comparative experiments show that this method significantly improves contiguous planting, ensures soybean production, stabilizes economic benefits, optimizes rotation patterns, and reduces paddy-dryland conversions, providing a scientific and feasible planning scheme for agricultural production. Future research can be expanded in three directions. First, further optimizing genetic algorithm parameters and introducing technologies such as deep reinforcement learning to enhance algorithm performance. Second, integrating multi-source heterogeneous data to build dynamic parameter systems and strengthen model generalization. Third, extending the method to more agricultural regions such as southern hilly areas, adjusting constraints according to local topography and crop characteristics to achieve broader application value. The research findings can provide decision support for planting structure optimization and high-standard farmland construction in major grain-producing areas of Northeast China.

    Embedded Fluorescence Imaging Detection System for Fruit and Vegetable Quality Deterioration Based on Improved YOLOv8 |
    GAO Chenhong, ZHU Qibing, HUANG Min
    2025, 7(5):  146-155.  doi:10.12133/j.smartag.SA202505038
    Asbtract ( 50 )   HTML ( 7)   PDF (1615KB) ( 1 )  
    Figures and Tables | References | Related Articles | Metrics

    [Objective] Fresh fruits and vegetables are prone to quality deterioration during storage and transportation due to microbial proliferation and changes in enzyme activity. Although traditional quality detection methods (e.g., physicochemical analysis and microbial culture) offer high accuracy, they are destructive, time-consuming, and require expert operation, making them inadequate for the modern supply chain's demand for real-time, non-destructive detection. While advanced optical detection technologies like hyperspectral imaging provide non-destructive advantages, the equipment is expensive, bulky, and lacks portability. This study aimed to integrate fluorescence imaging technology, embedded systems, and lightweight deep learning models to develop an embedded detection system for fruit and vegetable quality deterioration, addressing the bottlenecks of high cost and insufficient portability in current technologies, and providing a low-cost, efficient solution for non-destructive quality detection of fruits and vegetables. [Methods] An embedded quality detection system based on fluorescence imaging and a ZYNQ platform was developed. The system adopted the Xilinx ZYNQ XC7Z020 heterogeneous SoC as the core controller and used 365 nm, 10 W ultraviolet LED beads as the excitation light source. A CMOS camera served as the image acquisition sensor to capture and process fluorescence images. Algorithmically, an improved, lightweight object detection model based on YOLOv8 was developed. The improved model replaced the original YOLOv8 backbone network with MobileNetV4 to reduce computational load. To further achieve lightweighting, a channel pruning technique based on the batch normalization (BN) layer's scaling factor (γ) was employed. During training, L1 regularization was applied to γ to induce sparsity, after which channels with small γ values were pruned according to a threshold (γ_threshold = 0.01), followed by fine-tuning of the pruned model. Finally, in accordance with the hardware characteristics of the ZYNQ platform, a dynamic 16-bit fixed-point quantization method was adopted to convert the model from 32-bit floating point to 16-bit fixed point, and the FPGA's parallel computing capability was utilized for hardware acceleration to improve inference speed. [Results and Discussions] Grapes and spinach were used as experimental samples in a controlled laboratory setting (26 °C; 20%~40% humidity) over an eight-day storage experiment. Fluorescence images were collected daily, and physicochemical indices were measured simultaneously to construct ground-truth labels (spinach: chlorophyll, vitamin C; grapes: titratable acidity, total soluble solids). K-means clustering combined with principal component analysis (PCA) was used to categorize quality into three levels, "fresh" "sub-fresh" and "spoiled", based on changes in physicochemical indices, and images were labeled accordingly. In terms of system performance, the improved YOLOv8-MobileNetV4 model achieved a mean average precision (mAP) of 95.91% for the three-level quality classification. Ablation results showed that using only the MobileNetV4 backbone or applying channel pruning to the original model each reduced average detection time (by 14.0% and 29.0%, respectively) but incurred some loss of accuracy. In contrast, combining both yielded a synergistic effect: precision reached 97.04%, while recall and mAP increased to 95.24% and 95.91%, respectively. Comparative experiments indicated that the proposed model (8.98 MB parameters) outperformed other mainstream lightweight models (e.g., Faster R-CNN and YOLOv8-Ghost) in mAP and also exhibited faster detection, demonstrating an excellent balance between accuracy and efficiency. [Conclusions] Targeting practical needs in detecting fruit and vegetable quality deterioration, this study proposed and implemented an efficient detection system based on fluorescence imaging and an embedded platform. By integrating the MobileNetV4 backbone with the YOLOv8 detection framework and introducing BN-based channel pruning, the model achieved structured compression and accelerated inference. Experimental results showed that the YOLOv8-MobileNetV4 plus pruning model significantly reduced model size and hardware resource consumption while maintaining detection accuracy, thereby enhancing real-time responsiveness. The system's low hardware cost, compact size, and portability make it a practical solution for rapid, non-destructive, real-time quality monitoring in fruit and vegetable supply chains. Future work will focus on expanding the sample library to include more produce types and mixed deterioration levels and further optimizing the algorithm to improve robustness in complex multi-target scenarios.

    Detection Method of Ectropis Grisescens Larvae in Canopy Environments Based on YOLO and Diffusion Models |
    LUO Xuelun, GOUDA Mostafa, SONG Xinbei, HU Yan, ZHANG Wenkai, HE Yong, ZHANG Jin, LI Xiaoli
    2025, 7(5):  156-168.  doi:10.12133/j.smartag.SA202505023
    Asbtract ( 109 )   HTML ( 6)   PDF (2828KB) ( 6 )  
    Figures and Tables | References | Related Articles | Metrics

    [Objective] Tea has become one of the most important economic crops globally, driven by the growing popularity of tea-based beverages. However, tea production is increasingly threatened by biotic stressors, among which Ectropis grisescens stands out as a major defoliating pest. The larvae of this moth species cause substantial damage to tea plants by feeding on their leaves, thereby reducing yield and affecting the overall quality of the crop. The manual methods are not only time-consuming and labor-intensive but also suffer from low efficiency, high costs, and considerable subjectivity. In this context, the development of intelligent, accurate, and automated early detection techniques for Ectropis grisescens larvae is of vital significance. Such advancements hold the potential to enhance pest management strategies, reduce economic losses, and promote sustainable tea cultivation practices. [Methods] The recognition framework was proposed to achieve real-time and fine-grained identification of E. grisescens larvae at four distinct instar stages within complex tea canopy environments. To capture the varying morphological characteristics across developmental stages, a hierarchical three-level detection system was designed, consisting of: (1) full-instar detection covering all instars from the 1st to the 4th, (2) grouped-stage detection that classified larvae into early (1st-2nd) and late (3rd-4th) instar stages, and (3) fine-grained detection targeting each individual instar stage separately. Given the challenges posed by limited, imbalanced, and noisy training data—common issues in field-based entomological image datasets— a semi-automated dataset optimization strategy was introduced to enhance data quality and improve class representation. Building upon this refined dataset, a controllable diffusion model was employed to generate a large number of high-resolution, labeled synthetic images that emulated real-world appearances of Ectropis grisescens larvae under diverse environmental conditions. To ensure the reliability and utility of the generated data, a novel high-quality image filtering strategy was developed that automatically evaluated and selected images containing accurate, detailed, and visually realistic larval instances. The filtered synthetic images were then strategically integrated into the real training dataset, effectively augmenting the data and enhancing the diversity and balance of training samples. This comprehensive data augmentation pipeline led to substantial improvements in the detection performance of multiple YOLO-series models (YOLOv8, YOLOv9, YOLOv10, and YOLOv11). [Results and Discussions] Experimental results clearly demonstrated that the YOLO series models exhibited strong and consistent performance across a range of detection tasks involving Ectropis grisescens larvae. In the full-instar detection task, which targeted the identification of all larval stages from 1st to 4th instars, the best-performing YOLO model achieved an impressive average mAP@50 of 0.904, indicating a high level of detection precision. In the grouped instar-stage detection task, where larvae were classified into early (1st–2nd) and late (3rd–4th) instar groups, the highest mAP@50 recorded was 0.862, reflecting the model's ability to distinguish developmental clusters with reasonable accuracy. For the more challenging fine-grained individual instar detection task—requiring the model to discriminate among each instar stage independently—the best mAP@50 reached 0.697, demonstrating the feasibility of detailed stage-level classification despite subtle morphological differences. The proposed semi-automated data optimization strategy contributed significantly to performance improvements, particularly for the YOLOv8 model. Specifically, YOLOv8 showed consistent gains in mAP@50 across all three detection tasks, with absolute improvements of 0.024, 0.027, and 0.022 for full-instar, grouped-stage, and fine-grained detection tasks, respectively. These enhancements underscored the effectiveness of the dataset refinement process in addressing issues related to data imbalance and noise. Furthermore, the incorporation of the controllable diffusion model led to a universal performance boost across all YOLO variants. Notably, YOLOv10 exhibited the most substantial gains among the evaluated models, with its average mAP@50 increasing from 0.811 to 0.821 across the three detection tasks. This improvement was statistically significant, as confirmed by a paired t-test (p < 0.05), suggesting that the synthetic images generated by the diffusion model effectively enriched the training data and improved model generalization. Among all evaluated models, YOLOv9 achieved the best overall performance in detecting Ectropis grisescens larvae. It attained top mAP@50 scores of 0.909, 0.869, and 0.702 in the full-instar, grouped-stage, and fine-grained detection tasks, respectively. When averaged across all tasks, YOLOv9 reached a mean mAP@50 of 0.826, accompanied by a macro F1-Score of 0.767, highlighting its superior balance between precision and recall. [Conclusions] This study demonstrated that the integration of a controllable diffusion model with deep learning enabled accurate field-level instar detection of Ectropis grisescens, providing a reliable theoretical and technical foundation for intelligent pest monitoring in tea plantations.

    Light-Trapping Rice Planthopper Detection Method by Combining Spatial Depth Transform Convolution and Multi-scale Attention Mechanism |
    LI Wenzheng, YANG Xinting, SUN Chuanheng, CUI Tengpeng, WANG Hui, LI Shanshan, LI Wenyong
    2025, 7(5):  169-181.  doi:10.12133/j.smartag.SA202507024
    Asbtract ( 71 )   HTML ( 3)   PDF (3181KB) ( 4 )  
    Figures and Tables | References | Related Articles | Metrics

    [Objective] Planthoppers suck the sap from the phloem of rice plants, causing malnutrition and slow growth of the plants, resulting in large-scale yield reduction. Therefore, timely and effective monitoring of planthopper pests and analysis of their occurrence degree are of vital importance for the prevention of rice diseases. The traditional detection of planthopper pests mainly relies on manual methods for diagnosis and identification. However, due to the tiny size of planthopper pests, on-site manual investigation is not only time-consuming and labor-intensive but also greatly influenced by human subjectivity, making it easy to misjudge. In response to the above issues, the intelligent light traps can be used to assist in the work. When using intelligent light traps to detect dense and occluded low-resolution and small-sized planthopper pests, problems such as low accuracy, false detection, and missed detection are prone to occur. For this purpose, based on YOLOv11x, a light-trapping rice planthopper detection method by combining spatial depth transform convolution and multi-scale attention mechanism was proposed in this research. [Methods] The image data in this research were collected by multiple light-induced pest monitoring devices installed in the experimental rice fields. The images included two types of planthopper pests, the brown planthopper and the white-backed planthopper. The image sizes were both 5 472 pixels ×3 648 pixels, totaling 998 images. The original dataset was divided into a training set and a validation set in a 4:1 ratio. To enhance the learning efficiency of the model during training, two data augmentation operations, horizontal flipping and vertical flipping, were performed on the images in the training set. A total of 2 388 images in the training set were obtained for model training, and 200 images in the validation set were used for model inference validation. To improve the model performance, first of all, the C3k2 module in the original YOLOv11x network was improved by using the efficient multi-scale attention (EMA) mechanism to enhance the perception of the model and the fusion ability of small-volume pest features in dense and occlusions. Secondly, the space-to-depth-convolution (SPD-Conv) was used to replace the Conv common convolution module in the original model, further improving the extraction accuracy of the model for low-resolution and small-volume pest features and reducing the number of parameters. In addition, a P2 detection layer was added to the original network and the P5 detection layer was removed, thereby enhancing the model's detection performance for small targets in a targeted manner. Finally, by introducing the dynamic non-monotonic focusing mechanism loss function wise-intersection over union (WIoU)v3, the positioning ability of the model was enhanced, thereby reducing the false detection rate and missed detection rate. [Results and Discussions] The test results showed that the precision (P), recall (R), mean average precision at IoU equals 0.50 (mAP50) and the mean average precision at IoU thresholded from 0.50 to 0.95 with a step size of 0.05 (mAP50-95) of the improved model on the self-built rice planthopper dataset (dataset_Planthopper) reached 77.5%, 73.5%, 80.8%, and 44.9% respectively. Compared with the baseline model YOLOv11x, it has increased by 4.8, 3.5, 5.5 and 4.7 percent points, respectively. The number of parameters has been reduced from 56 M to 40 M, a reduction of 29%. Compared with the current mainstream object detection models YOLOv5x, YOLOv8x, YOLOv10x, YOLOv11x, YOLOv12x, Salience DETR-R50, Relation DETR-R50, RT-DETR-x, the mAP50 of the improved model was 6.8, 7.8, 8.6, 5.5, 5.6, 8.7, 6.9 and 6.9 percentage points higher, respectively, and it had the best comprehensive performance. [Conclusions] The improved YOLOv11x model effectively enhances the performance of detecting low-resolution and small-sized planthopper pests under dense and occluded insect conditions, and reduces the probability of missed detection and false detection. In practical applications, it could assist in achieving precise monitoring of farmland pests and scientific prevention and control decisions, thereby reducing the use of chemical pesticides and promoting the intelligent development of agriculture. Although this method has achieved significant improvements in multiple indicators, it still had certain limitations. Firstly, the species of planthoppers were numerous and their forms were diverse. The current models mainly targeted some typical species, and their generalization ability needed to be further verified. Secondly, due to the limitations of the data collection environment, there was still room for improvement in the performance of the model under extreme lighting changes and extremely occluded scenarios. Finally, although the number of parameters had decreased, the real-time detection speed still needed to be optimized to meet the requirements of some low-power edge devices. Future research can focus on expanding the generalization, robustness and lightweighting of more types of rice planthopper models in more complex situations.

    Beef Cattle Object Detection Method Under Occlusion Environment Based on Improved YOLOv12 |
    LIU Yiheng, LIU Libo
    2025, 7(5):  182-192.  doi:10.12133/j.smartag.SA202503018
    Asbtract ( 128 )   HTML ( 12)   PDF (2456KB) ( 6 )  
    Figures and Tables | References | Related Articles | Metrics

    [Objective] With the rapid development of intelligent agriculture, computer vision-based livestock detection technology has become increasingly important in modern farming management. Among various livestocks, beef cattle play a crucial role in animal husbandry industry all over the world. Accurate detection and counting of beef cattle are essential for improving breeding efficiency, monitoring animal health, and supporting government subsidy distribution. However, in real-world farming environments, cattle often gather and move closely together, leading to frequent occlusions. These occlusions significantly degrade the performance of traditional object detection algorithms, resulting in missed detections, false positives, and poor robustness. Manual counting methods are labor-intensive, error-prone, and inefficient, while existing deep learning-based detection models still struggle with occlusion scenarios due to limited feature extraction capabilities and insufficient use of global contextual information. To address these challenges, an improved object detection algorithm named YOLOv12s-ASR, based on the YOLOv12s framework, was proposed in this research. The goal is to enhance detection accuracy and real-time performance in complex occlusion conditions, providing a reliable technical solution for intelligent beef cattle monitoring. [Methods] The proposed YOLOv12s-ASR algorithm introduced three key improvements to the baseline YOLOv12s model. First, part of the standard convolution layers with a modifiable kernel convolution module (AKConv) was replaced. Unlike traditional convolutions with fixed kernel shapes, AKConv could dynamically adjust the shape and size of the convolution kernel according to the input image content. This flexibility allowed the model to better capture local features of occluded cattle, especially in cases where only partial body parts were visible. Second, a self-ensembling attention mechanism (SEAM) was integrated into the Neck structure. SEAM combined spatial and channel attention through depthwise separable convolutions and consistency regularization, enabling the model to learn more robust and discriminative features. It enhanced the model's ability to perceive global contextual information, which was crucial for inferring the presence and location of occluded targets. Third, a repulsion loss function was introduced to supplement the original loss. This loss function included two components: RepGT, which pushed the predicted box away from nearby ground truth boxes, and RepBox, which encouraged separation between different predicted boxes. By reducing the overlap between adjacent predictions, the repulsion loss helped mitigate the negative effects of non-maximum suppression (NMS) in crowded scenes, thereby improving localization accuracy and reducing missed detections. The overall architecture maintained the lightweight design of YOLOv12s, ensuring that the model remained suitable for deployment on edge devices with limited computational resources. Extensive experiments were conducted on a self-constructed beef cattle dataset containing 2 458 images collected from 13 individual farms in Ningxia, China. The images were captured using surveillance cameras during daytime hours and included various occlusion scenarios. The dataset was divided into training, validation, and test sets in a 7:2:1 ratio, with annotations carefully reviewed by multiple experts to ensure accuracy. [Results and Discussions] The proposed YOLOv12s-ASR algorithm achieved a mean average precision (mAP) of 89.3% on the test set, outperforming the baseline YOLOv12s by 1.3 percent points. The model size was only 8.5 MB, and the detection speed reached 136.7 frames per second, demonstrating a good balance between accuracy and efficiency. Ablation studies confirmed the effectiveness of each component: AKConv improved mAP by 0.6 percent point, SEAM by 1.0 percent point and repulsion loss by 0.6 percent point. When all three modules were combined, the mAP increased by 1.3 percent points, validating their complementary roles. Furthermore, the algorithm was evaluated under different occlusion levels—slight, moderate, and severe. Compared to YOLOv12s, YOLOv12s-ASR improved mAP by 4.4, 2.9, and 4.4 percent points, respectively, showing strong robustness across varying occlusion conditions. Comparative experiments with nine mainstream detection algorithms, including Faster R-CNN, SSD, Mask R-CNN, and various YOLO versions, further demonstrated the superiority of YOLOv12s-ASR. It achieved the highest mAP while maintaining a compact model size and fast inference speed, making it particularly suitable for real-time applications in resource-constrained environments. Visualization results also showed that YOLOv12s-ASR could more accurately detect and localize cattle targets in crowded and occluded scenes, with fewer false positives and missed detections. [Conclusions] Experimental results show that YOLOv12s-ASR achieves state-of-the-art performance on a self-built beef cattle dataset, with high detection accuracy, fast processing speed, and a lightweight model size. These advantages make it well-suited for practical applications such as automated cattle counting, behavior monitoring, and intelligent farm management. Future work will focus on further enhancing the model's generalization ability in more complex environments and extending its application to multi-object tracking and behavior analysis tasks.