Welcome to Smart Agriculture 中文
30 July 2025, Volume 7 Issue 4
Topic--Intelligent Sensing and Grading of Agricultural Product Quality
Spectral Technology in Vegetable Production Detection: Research Progress, Challenges and Suggestions |
BAI Juekun, CHEN Huaimeng, DONG Daming, LIU Yachao, YUE Xiaolong, DU Xiuke
2025, 7(4):  1-17.  doi:10.12133/j.smartag.SA202504027
Asbtract ( 1791 )   HTML ( 25)   PDF (1474KB) ( 33 )  
Figures and Tables | References | Related Articles | Metrics

[Significance] Vegetables are indispensable to global food security and human nutrition, yet approximately 33% of the annual 1.2 billion-ton harvest is lost or wasted, largely because of undetected biotic and abiotic stresses, poor post-harvest management, and chemical safety hazards. Conventional analytical workflows, based on wet chemistry and chromatography, are destructive, labour-intensive, and difficult to scale, creating an urgent need for rapid, non-invasive sensing tools that can operate across the full production-to-consumption continuum. Optical spectroscopy, spanning near-infrared (NIR), Raman, fluorescence, laser-induced breakdown spectroscopy (LIBS), and UV-Vis modalities, offers label-free, multiplexed, and second-scale measurements directly on living plants or minimally processed products. Existing reviews have concentrated on isolated techniques or single application niches, leaving critical knowledge gaps regarding hardware robustness under open-field conditions, algorithmic generalisability across cultivars and climates, data interoperability, and cost-driven adoption barriers for smallholders. [Progress] This paper presents a holistic, chain-wide appraisal of spectroscopic sensing in vegetable production. It shows that hardware evolution has been dominated by miniaturisation and functional integration. Hand-held NIR units (e.g., Neospectra MEMS, NirVana AG410) now weigh <300 g and achieve R2 > 0.95 for soluble solids and moisture in tomato, zucchini, and pepper. Palm-top Raman systems (9 × 7 × 4 cm) equipped with 1 064 nm lasers and InGaAs detectors suppress fluorescence sufficiently to quantify lycopene (RMSE = 1.14 mg/100 g) and classify ripeness stages with 100% accuracy. Battery-powered fluorescence sensors coupled with smartphones wirelessly stream data to cloud-based convolutional neural networks (CNNs), delivering 93%~100% correct cultivar identification for spinach, onion, and tomato seeds within 5 s per sample. Methodological advances combine advanced chemometrics and deep learning. Transfer learning enables a model trained on greenhouse tomatoes to predict field-grown cherry tomatoes with only 10% recalibration samples, cutting data acquisition costs by 70%. SERS substrates, fabricated as flexible "place-and-play" nano-mesh films, boost Raman signals by 106~108, pushing limits of detection for carbaryl, imidacloprid, and thiamethoxam below 1 mg/kg on pak-choi and lettuce. Multi-modal fusion (LIBS-NIR) simultaneously quantifies macro-elements (Ca, K, Mg) and micro-elements (Fe, Mn) with relative errors <5%. Chain-wide demonstrations span five critical stages: (i) breeding—NIR screens seed viability via starch and moisture signatures; (ii) cultivation—portable Raman "leaf-clip" sensors detect nitrate deficiency (1 045 cm-1 peak) and early pathogen attack (LsoA vs. LsoB, 80% accuracy) in lettuce and tomato before visible symptoms emerge; (iii) harvest—non-invasive lycopene monitoring in tomato and carotenoid profiling in chilli guides optimal picking time and reduces post-harvest losses by 15%; (iv) storage—chlorophyll fluorescence tracks water loss and senescence in black radish and carrot over six-month cold storage, enabling dynamic shelf-life prediction; (v) market entry—LIBS inspects incoming crates for Pb and Cd in seconds, while fluorescence-SVM pipelines simultaneously verify pesticide residues, ensuring compliance with EU and Chinese MRLs. Data governance initiatives are emerging but remain fragmented. Several consortia have released open spectral libraries (e.g., VegSpec-1.0 with 50 000 annotated spectra from 30 vegetable species), yet differences in acquisition parameters, preprocessing pipelines, and metadata schemas hinder cross-study reuse. [Conclusions and Prospects] Spectroscopic sensing has matured from laboratory proof-of-concept to robust field prototypes capable of guiding real-time decisions across the entire vegetable value chain. Nevertheless, four priority areas must be addressed to unlock global adoption: Model generalisation—curate large-scale, multi-environment, multi-cultivar spectral repositories and embed meta-learning algorithms that continuously adapt to new genotypes and climates with minimal retraining. Hardware resilience—develop self-calibrating sensors with adaptive optics and real-time environmental compensation (temperature, humidity, ambient light) to maintain laboratory-grade SNR in dusty, humid, or high-irradiance field settings. Standardisation and interoperability—establish ISO-grade protocols for hardware interfaces, data formats, calibration transfer, and privacy-preserving data sharing, enabling seamless integration of devices, clouds, and decision-support platforms. Cost-effective commercialisation—pursue modular, open-hardware designs leveraging printed optics and economies of scale to reduce unit costs below USD 500, and introduce service-based models (leasing, pay-per-scan) tailored to smallholder economics. If these challenges are met, spectroscopy-based digital twins of vegetable production systems could become a reality, delivering safer food, reduced waste, and climate-smart agriculture within the next decade.

Application of Photoacoustic Spectroscopy in Quality Assessment of Agricultural and Forestry Products |
XIE Weijun, CHEN Keying, QIAO Mengmeng, WU Bin, GUO Qing, ZHAO Maocheng
2025, 7(4):  18-30.  doi:10.12133/j.smartag.SA202505026
Asbtract ( 1289 )   HTML ( 3)   PDF (1651KB) ( 15 )  
Figures and Tables | References | Related Articles | Metrics

[Significance] The quality assessment of agricultural and forestry products is a core process in ensuring food safety and enhancing product competitiveness. Traditional detection methods suffer from drawbacks such as sample destruction, expensive equipment, and poor adaptability. As an innovative analytical technique combining optical and acoustic detection principles, photoacoustic spectroscopy technology (PAS) overcomes the limitations of conventional detection techniques that rely on transmitted or reflected optical signals through its unique light-thermal-acoustic energy conversion mechanism. With its non-contact, high-sensitivity, and multi-form adaptability characteristics, PAS has been increasingly applied in the quality assessment of agricultural and forestry products in recent years, providing a new solution for the simultaneous detection of internal and external quality in these products. [Progress] In the specific applications of agricultural and forestry product testing, PAS has demonstrated practical value in multiple aspects. In seed testing, researchers have established quantitative relationship models between photoacoustic signals and seed viability also achieved dynamic assessment of seed health by monitoring respiratory metabolic gases (e.g., CO2 and ethylene). In fruit and vegetable quality analysis, PAS can capture characteristic substance changes during ripening. In the quality control of grain and oil products, Fourier-transform infrared PAS technology has been successfully applied to the rapid detection of protein content in wheat flour and aflatoxin in corn. In food safety monitoring, PAS has achieved breakthrough progress in heavy metal residue detection, pesticide residue analysis, and food authenticity identification. [Conclusions and Prospects] Despite its evident advantages, PAS technology still faces multiple challenges in practical implementation. ​Technically​​, the complex matrix of agricultural and forestry products causes non-uniform generation and propagation of photoacoustic signals, complicating data analysis. And environmental noise interference (e.g., mechanical vibrations, temperature fluctuations) compromises detection stability, while spectral peak overlap in multi-component systems limits quantitative analysis accuracy. ​​Equipment-wise​​, current PAS systems remain bulky and costly, primarily due to reliance on imported core components like high-power lasers and precision lock-in amplifiers, severely hindering widespread adoption. Moreover, the absence of standardized photoacoustic databases and universal analytical models restricts the technology's adaptability across diverse agricultural products. Looking forward, PAS development may focus on these key directions.​Firstly, multi-technology integration by combining with Raman spectroscopy, near-infrared spectroscopy, and other sensing methods to construct multidimensional data spaces for enhanced detection specificity. Moreover, ​​miniaturization​​ through developing chip-based detectors via micro-electromechanical technology, replacing conventional solid-state lasers with vertical-cavity surface-emitting lasers (VCSELs), and adopting 3D printing for integrated photoacoustic cell fabrication to significantly reduce system size and cost. Furthermore, intelligent algorithm innovation with incorporating advanced deep learning models like attention mechanisms and transfer learning to improve interpretation of complex photoacoustic spectra. As these technical bottlenecks are progressively overcome, PAS is poised to establish a quality monitoring network spanning the entire "field-to-market" chain—from ​​harvesting​​ to ​​processing/storage​​ to ​​distribution​​—thereby transforming agricultural quality control from traditional sampling-based methods to ​​intelligent, standardized, full-process monitoring​​. This will provide technical support for ​​food safety assurance​​ and ​​agricultural industry advancement​​.

Advances in the Application of Multi-source Data Fusion Technology in Non-Destructive Detection of Apple |
GUO Qi, FAN Yixuan, YAN Xinhuan, LIU Xuemei, CAO Ning, WANG Zhen, PAN Shaoxiang, TAN Mengnan, ZHENG Xiaodong, SONG Ye
2025, 7(4):  31-46.  doi:10.12133/j.smartag.SA202505036
Asbtract ( 1390 )   HTML ( 5)   PDF (1579KB) ( 9 )  
Figures and Tables | References | Related Articles | Metrics

[Significance] Apple industry is a prominent agricultural sector that is of considerable importance globally. Ensuring the highest standards of quality and safety is paramount for achieving consumer satisfaction. Non-destructive testing technologies have emerged as a powerful tool, enabling rapid and objective evaluation of fruit attributes. However, individual non-destructive testing technologies methods frequently possess inherent limitations, proving insufficient for comprehensive assessment. The synergistic application of multi-source data fusion technology in the non-destructive testing integrates information from multiple sensors to overcome the shortcomings of single-modality systems. The integration of disparate data streams constitutes the foundational technological framework that enables the advancement of apple quality control. This technological framework facilitates enhanced detection of defects and diseases, thereby contributing to the intelligent transformation of the apple industry value chain in its entirety. [Progress] This paper presents a systematic and comprehensive examination of recent advancements in multi-source data fusion for apple non-destructive testing. The principles, advantages, and typical application scenarios of five mainstream non-destructive testing technologies are first introduced: near-infrared (NIR) spectroscopy, particularly adept at quantifying internal chemical compositions such as soluble solids content (SSC) and firmness by analyzing molecular vibrations; hyperspectral imaging (HSI), which combines spectroscopy and imaging to provide both spatial and spectral information, making it ideal for visualizing the distribution of chemical components and identifying defects like bruises; electronic nose (E-nose) technology, a method for detecting unique patterns of volatile organic compounds (VOCs) to profile aroma and detect mold; machine vision, a process that analyzes external features such as color, size, shape, and texture for grading and surface defect identification; and nuclear magnetic resonance (NMR), a technique that provides detailed insights into internal structures and water content, useful for detecting internal defects such as core rot. A critical evaluation of the fundamental methodologies in data fusion is conducted, with these methodologies categorized into three distinct levels. Data-level fusion entails the direct concatenation of raw data from homogeneous sensors or preprocessed heterogeneous sensors. This approach is straightforward. It can result in high dimensionality and is susceptible to issues related to data co-registration. Feature-level fusion, the most prevalent strategy, involves extracting salient features from each data source (e.g., spectral wavelengths, textural features, gas sensor responses) and subsequently combining these feature vectors prior to model training. This intermediate approach effectively reduces redundancy and noise, and enhances model robustness. Decision-level fusion operates at the highest level of abstraction, where independent models are trained for each data modality, and their outputs or predictions are integrated using algorithms such as weighted averaging, voting schemes, or fuzzy logic. This strategy offers maximum flexibility for integrating highly disparate data types. The paper also thoroughly elaborates on the practical implementation of these strategies, and presents case studies on the fusion of different spectral data (e.g., NIR and HSI), the integration of spectral and E-nose data for combined internal quality and aroma assessment, and the powerful combination of machine vision with spectral data for simultaneous evaluation of external appearance and internal composition. [Conclusions and Prospects] The integration of multi-source data fusion technology has driven significant advancements in the field of apple non-destructive testing. This progress has substantially improved the accuracy, reliability, and comprehensiveness of quality evaluation and control systems. By synergistically combining the strengths of different sensors, it enables a holistic assessment that is unattainable with any single technology. However, the field faces persistent challenges, including the effective management of data heterogeneity (i.e., varying scales, dimensions, and physical meanings), the high computational complexity of sophisticated fusion models, and the poor portability of current multi-sensor laboratory equipment—all of which hinder online industrial applications. Future research should prioritize several key areas. First, developing automated, user-friendly fusion platforms is imperative to simplify data processing and model deployment. Second, optimizing and developing lightweight algorithms (e.g., through model compression and knowledge distillation) is critical to enhancing real-time performance for high-throughput sorting lines. Third, creating compact, cost-effective, integrated hardware that combines multiple detection technologies into a single portable device will improve stability and accessibility. Additionally, new application frontiers should be explored, such as in-field monitoring of fruit maturation and predicting post-harvest shelf life. The innovative integration of advanced algorithms and hardware holds the potential to provide substantial support for the intelligent and sustainable development of the global apple industry.

Monte Carlo Simulation of Light Propagation in Orah Mandarin Tissues and Optimization of Spectral Detection in Diffuse Reflection Mode |
OUYANG Aiguo, WANG Yang, LIU Yande, HOU Youfei, WANG Guantian
2025, 7(4):  47-57.  doi:10.12133/j.smartag.SA202505029
Asbtract ( 1544 )   HTML ( 2)   PDF (1690KB) ( 12 )  
Figures and Tables | References | Related Articles | Metrics

[Objective] Visible light/near-infrared (Vis/NIR) spectroscopy serves as an effective method for quality assessment of orah mandarin. However, as a multi-layered thick-skinned fruit, the optical properties (OPs) of different tissue layers in orah mandarin affect quality evaluation, resulting in weak signals and difficulties in extracting pulp information when applying Vis/NIR spectroscopy in practical applications. This research utilizes Monte Carlo methods to reveal the light propagation mechanism within the multi-layered tissues of orah mandarin, clarify the optical properties of each tissue layer and their contributions to detection signals, and provide theoretical basis and technical support for optimizing spectral detection systems under diffuse reflectance mode. [Methods] Orah mandarin was selected as the research material. The optical parameters of its oil sac layer, albedo layer, and pulp tissue were measured in the 500~1 050 nm band using a single integrating sphere system combined with the Inverse Adding-Doubling method (Integrating Sphere-Inverse Adding-Doubling method, IS-IAD). Based on the optical parameters of different tissue layers, a three-layer concentric sphere model (oil sac layer, albedo layer, and pulp tissue) was established. The voxel-based Monte Carlo eXtreme (MCX) method was employed to study the transmission patterns of simulated photons in orah mandarin under diffuse reflectance mode, in order to optimize the configuration of detection devices. [Results and Discussions] The experimental results demonstrated that throughout the entire wavelength range, the oil sac layer and albedo layer exhibited identical variation trends in average absorption coefficient and average reduced scattering coefficient. The oil sac layer, rich in liposoluble pigments such as carotenoids, resulted in a peak absorption coefficient at 500 nm, while the porous structure of the albedo layer led to a higher reduced scattering coefficient, and the pulp tissue exhibited the lowest reduced scattering coefficient due to its translucent structure. Light penetration depth analysis revealed that in the 500~620 nm band, the light penetration depth of the oil sac layer was higher than that of the albedo layer, while at 980 nm, due to water molecule absorption, the light penetration depth of the pulp tissue showed a significant valley. Monte Carlo simulation results indicated that light was primarily absorbed within orah mandarin tissue, with transmitted photons accounting for less than 4.2%. As the source-detector distance increased, the average optical path and light attenuation in orah mandarin tissue showed an upward trend, while the contribution rates of the oil sac layer, albedo layer, and pulp tissue to the detected signal showed decreasing, decreasing, and increasing trends, respectively. Additionally, the optical diffuse reflectance decreased significantly with increasing source-detector distance. Based on the simulation results, it was recommended that the source-detector distance for orah mandarin quality detection devices should be set in the range of 13~15 mm. This configuration could maintain a high signal contribution rate from pulp tissue while obtaining sufficient diffuse reflectance signal strength, thereby improving detection accuracy and reliability. [Conclusions] The combination of Vis/NIR spectroscopy and Monte Carlo simulation methods systematically reveals the light propagation patterns and energy distribution within orah mandarin tissue, providing important theoretical basis and methodological support for non-destructive detection of orah mandarin. By employing a single integrating sphere system with the Inverse Adding-Doubling method to obtain optical parameters of each tissue layer and utilizing voxel-based Monte Carlo simulation to thoroughly investigate photon propagation patterns within the fruit, this research accurately quantifies the contribution rates of different tissue layers to diffuse reflectance signals and effectively optimizes key parameters of the detection system. These findings provide important references for developing more precise non-destructive detection methods and equipment for orah mandarin.

Rapid Tea Identification and Polyphenol Detection Method in Fresh Tea Leaves Using Visible/Shortwave and Longwave Near-Infrared Spectroscopy |
XU Jinchai, LI Xiaoli, WENG Haiyong, HE Yong, ZHU Xuesong, LIU Hongfei, HUANG Zhenxiong, YE Dapeng
2025, 7(4):  58-70.  doi:10.12133/j.smartag.SA202505034
Asbtract ( 1399 )   HTML ( 8)   PDF (9234KB) ( 20 )  
Figures and Tables | References | Related Articles | Metrics

[Objective] Tea polyphenols, as a key indicator for evaluating tea quality, possess significant health benefits. Traditional detection methods are limited by poor timeliness, high cost, and destructive sampling, making them difficult to meet the demands of tea cultivar breeding and real-time monitoring of tea quality. Meanwhile, rapid identification of tea cultivars and leaf positions is critical for guiding tea production. Therefore, this study aims to develop a non-destructive detection device for quality components of fresh tea leaves based on the combined technology of visible/short-wave near-infrared and long-wave near-infrared spectroscopy, to realize rapid non-destructive detection of tea polyphenol content and rapid identification of tea cultivars and leaf positions. [Methods] A rapid non-destructive detection device for quality components of fresh tea leaves was developed by combining visible/short-wave near-infrared spectroscopy (400~1 050 nm) and long-wave near-infrared spectroscopy (1 051~1 650 nm). The Savitzky-Golay (SG) convolution smoothing method was used for preprocessing the spectral data. The Folin-Ciocalteu method was employed to determine the tea polyphenol content, and abnormal samples were eliminated using the interquartile range (IQR) method. Data-level and feature-level fusion methods were adopted, with the competitive adaptive reweighted sampling (CARS) algorithm used to extract characteristic wavelengths. Prior to modeling, the Kennard-Stone algorithm was applied to partition the dataset into a training set and a prediction set at a ratio of 4∶1. Models such as principal component analysis (PCA), partial least squares-discriminant analysis (PLS-DA), least squares support vector machine (LS-SVM), extreme learning machine (ELM), and 1D convolutional neural network (1D-CNN) were constructed for the identification of 3 cultivars (Huangdan, Tieguanyin, and Benshan) and 4 leaf positions. For predicting tea polyphenol content, models including partial least squares regression (PLSR), least squares support vector regression (LS-SVR), ELM, and 1D-CNN were established for predicting the tea polyphenol content in fresh tea leaves. [Results and Discussions] The results showed that there were significant differences in tea polyphenol contents among different cultivars and leaf positions (P<0.05). Specifically, the tea polyphenol content of Huangdan was 17.54%±1.82%, which was 1.16 times and 1.04 times that of Tieguanyin (15.04%±1.22%) and Benshan (16.81%±1.24%), respectively. For each cultivar, the tea polyphenol content generally showed a decreasing trend from the 1st to 4th leaf positions, with the highest content in the 1st leaf position. Principal component analysis (PCA) revealed that for cultivar identification, the scatter distribution of the principal components of Huangdan, Tieguanyin, and Benshan, as well as their projections in the directions of PC1 and PC2, showed a clear trend of clustering into three groups, indicating a good classification effect, although there was still some overlap among individual samples. For leaf position identification, the scatter distributions of the principal components of the 1st, 2nd, 3rd, and 4th leaf positions overlapped with each other, with no obvious clustering among leaf positions. Compared with single-source data, models based on data fusion effectively improved prediction performance. Among them, the PLS-DA model established by combining SG preprocessing with feature-level fusion achieved prediction accuracies of 100% and 87.93% for the identification of 3 tea cultivars and 4 leaf positions, respectively. Furthermore, the 1D-CNN model based on data-level fusion exhibited superior performance in predicting tea polyphenol content, with a coefficient of determination (R2P), root mean square error of prediction (RMSEP), and residual predictive deviation (RPD) of 0.802 0, 0.636 8%, and 2.268 4, respectively, which outperformed models using only visible/short-wave near-infrared spectroscopy or long-wave near-infrared spectroscopy. [Conclusions] The developed detection device combining visible/short-wave near-infrared and long-wave near-infrared spectroscopy, mainly composed of spectrometers, Y-type optical fibers, plant probes, polymer lithium batteries, DC uninterruptible power supplies, voltage conversion modules, and aluminum alloy casings, could synchronously collect multi-source spectral data of visible/short-wave near-infrared and long-wave near-infrared from fresh tea leaves. Combined with data fusion methods and machine learning algorithms, it enabled rapid detection of tea polyphenol content and efficient identification of cultivars and leaf positions in fresh tea leaves, providing new insights for the application of multi-source data fusion technology in elite tea cultivar breeding and non-destructive detection of fresh tea leaf quality.

Non-Destructive Inspection and Intelligent Grading Method of Fu Brick Tea at Fungal Fermentation Stage Based on Hyperspectral Imaging Technology |
HU Yan, WANG Yujie, ZHANG Xuechen, ZHANG Yiqiang, YU Huahao, SONG Xinbei, YE Sitan, ZHOU Jihong, CHEN Zhenlin, ZONG Weiwei, HE Yong, LI Xiaoli
2025, 7(4):  71-83.  doi:10.12133/j.smartag.SA202505012
Asbtract ( 1129 )   HTML ( 6)   PDF (2982KB) ( 20 )  
Figures and Tables | References | Related Articles | Metrics

[Objective] Fu brick tea is a popular fermented black tea, and its "Jin hua" fermentation process determines the quality, flavor and function of the tea. Therefore, the establishment of a rapid and non-destructive detection method for the fungal fermentation stage is of great significance to improve the quality control and processing efficiency. [Methods] The variation trend of Fu brick tea was analyzed through the acquisition of visible-near-infrared (VIS-NIR) and near-infrared (NIR) hyperspectral images during the fermentation stage, and combined with the key quality indexes such as moisture, free amino acids, tea polyphenols, and tea pigments (including theaflavins, thearubigins, and theabrownines), the variation trend was analyzed. This study combined support vector machine (SVM) and convolutional neural network (CNN) to establish quantitative detection of key quality indicators and qualitative identification of the fungal fermentation stage. To enhance model performance, the squeeze-and-excitation (SE) attention mechanism was incorporated, which strengthens the adaptive weight adjustment of feature channels, resulting in the development of the Spectra-SE-CNN model. Additionally, t-distributed stochastic neighbor embedding (t-SNE) was used for feature dimensionality reduction, aiding in the visualization of feature distributions during the fermentation process. To improve the interpretability of the model, the Grad-CAM technique was employed for CNN and Spectra-SE-CNN visualization, helping to identify the key regions the model focuses on. [Results and Discussions] In the quantitative detection of Fu brick tea quality, the best models were all Spectra-SE-CNN, with R2p of 0.859 5, 0.852 5 and 0.838 3 for moisture, tea pigments and tea polyphenols, respectively, indicating a high correlation and modeling stability. These values suggest that the models were capable of accurately predicting these key quality indicators based on hyperspectral data. However, the R2p for free amino acids was lower (0.670 2), which could be attributed to their relatively minor changes during the fermentation process or a weak spectral response, making it more challenging to detect this component reliably with the current hyperspectral imaging approach. The Spectra-SE-CNN model significantly outperformed traditional CNN models, demonstrating the effectiveness of incorporating the SE attention mechanism. The SE attention mechanism enhanced the model's ability to extract and discriminate important spectral features, thereby improving both classification accuracy and generalization. This indicated that the Spectra-SE-CNN model excels not only in feature extraction but also in enhancing the model's robustness to variations in the fermentation stage. Furthermore, t-SNE revealed a clear separation of the different fungal fermentation stages in the low-dimensional space, with distinct boundaries. This visualization highlighted the model's ability to distinguish between subtle spectral differences during the fermentation process. The heatmap generated by Grad-CAM emphasized key regions, such as the fermentation location and edges, providing valuable insights into the specific features the model deemed important for accurate predictions. This improved the model's transparency and helped validate the spectral features that were most influential in identifying the fermentation stages. [Conclusions] A Spectra-SE-CNN model was proposed in this research, which incorporates the SE attention mechanism into a convolutional neural network to enhance spectral feature learning. This architecture adaptively recalibrates channel-wise feature responses, allowing the model to focus on informative spectral bands and suppress irrelevant signals. As a result, the Spectra-SE-CNN achieved improved classification accuracy and training efficiency compared to CNN models, demonstrating the strong potential of deep learning in hyperspectral spectral feature extraction. The findings validate Hyperspectral imaging technology(HIS) enables rapid, non-destructive, and high-resolution assessment of Fu brick tea during its critical fungal fermentation stage and the feasibility of integrating HSI with intelligent algorithms for real-time monitoring of the Fu brick tea fermentation process. Furthermore, this approach offers a pathway for broader applications of hyperspectral imaging and deep learning in intelligent agricultural product monitoring, quality control, and automation of traditional fermentation processes.

Grading Asparagus officinalis L. Using Improved YOLOv11 |
YANG Qilang, YU Lu, LIANG Jiaping
2025, 7(4):  84-94.  doi:10.12133/j.smartag.SA202501024
Asbtract ( 1510 )   HTML ( 44)   PDF (1922KB) ( 99 )  
Figures and Tables | References | Related Articles | Metrics

[Objective] Asparagus officinalis L. is a perennial plant with a long harvesting cycle and fast growth rate. The harvesting period of tender stems is relatively concentrated, and the shelf life of tender stems is very short. Therefore, the harvested asparagus needs to be classified according to the specifications of asparagus in a short time and then packaged and sold. However, at this stage, the classification of asparagus specifications basically depends on manual work, and it is difficult for asparagus of different specifications to rely on sensory grading, which requires a lot of money and labor. To save labor costs, an algorithm based on asparagus stem diameter classification was developed using deep learning and computer vision technology. YOLOv11 was selected as the baseline model and several improvements were made to propose a lightweight model for accurate grading of post-harvest asparagus. [Methods] Dataset was obtained by cell phone photography of post-harvest asparagus using fixed camera positions. In order to improve the generalization ability of the model, the training set was augmented with data by increasing contrast, mirroring, and adjusting brightness. The data-enhanced training set included a total of 2 160 images for training the model, and the test set and validation set included 90 and 540 images respectively for inference and validation of the model. In order to enhance the performance of the improved model, the following four improvements were made to the baseline model, respectively. First, the efficient channel attention (ECA) module was added to the twelfth layer of the YOLOv11 backbone network. The ECA enhanced asparagus stem diameter feature extraction by dynamically adjusting channel weights in the convolutional neural network and improved the recognition accuracy of the improved model. Second, the bi-directional feature pyramid network (BiFPN) module was integrated into the neck network. This module modified the original feature fusion method to automatically emphasize key asparagus features and improved the grading accuracy through multi-scale feature fusion. What's more, BiFPN dynamically adjusted the importance of each layer to reduce redundant computations. Next, the slim-neck module was applied to optimize the neck network. The slim-neck module consisted of GSConv and VoVGSCSP. The GSConv module replaced the traditional convolutional. And the VoVGSCSP module replaced the C2k3 module. This optimization reduced computational costs and model size while improving the recognition accuracy. Finally, the original YOLOv11 detection head was replaced with an EfficientDet Head. EfficientDet Head had the advantages of light weight and high accuracy. This head co-training with BiFPN to enhance the effect of multi-scale fusion and improve the performance of the model. [Results and Discussions] In order to verify the validity of the individual modules introduced in the improved YOLOv11 model and the superiority of the performance of the improved model, ablation experiments and comparison experiments were conducted respectively. The results of the comparison test between different attentional mechanisms added to the baseline model showed that the ECA module had better performance than other attentional mechanisms in the post-harvest asparagus grading task. The YOLOv11-ECA had higher recognition accuracy and smaller model size, so the selection of the ECA module had a certain degree of reliability. Ablation experiments demonstrated that the improved YOLOv11 achieved 96.8% precision (P), 96.9% recall (R), and 92.5% mean average precision (mAP), with 4.6 GFLOPs, 1.67 × 10⁶ parameters, and a 3.6 MB model size. The results of the asparagus grading test indicated that the localization frames of the improved model were more accurate and had a higher confidence level. Compared with the original YOLOv11 model, the improved YOLOv11 model increased the precision, recall, and mAP by 2.6, 1.4, and 2.2 percentage points, respectively. And the floating-point operation, parameter quantity, and model size were reduced by 1.7 G, 9.1 × 105, and 1.6 MB, respectively. Moreover, various improvements to the model could increase the accuracy of the model while ensuring that the model was light weight. In addition, the results of the comparative tests showed that the performance of the improved YOLOv11 model was better than those of SSD, YOLOv5s, YOLOv8n, YOLOv11, and YOLOv12. Overall, the improved YOLOv11 had the best overall performance, but still had some shortcomings. In terms of the real-time performance of the model, the inference speed of the improved model was not optimal, and the inference speed of the improved YOLOv11 was inferior to that of YOLOv5s and YOLOv8n. The inference speed of improved YOLOv11 and YOLOv11 evaluate using the aggregate test. The results of the Wilcoxon signed-rank test showed that the improved YOLOv11 had a significant improvement in inference speed compared to the original YOLOv11 model. [Conclusions] The improved YOLOv11 model demonstrated better recognition, lower parameters and floating-point operations, and smaller model size in the asparagus grading task. The improved YOLOv11 could provide a theoretical foundation for intelligent post-harvest asparagus grading. Deploying the improved YOLOv11 model on asparagus grading equipment enables fast and accurate grading of post-harvest asparagus.

Rapid and Non-Destructive Analysis Method of Hawthorn Moisture Content Based on Hyperspectral Imaging Technology |
BAI Ruibin, WANG Hui, WANG Hongpeng, HONG Jiashun, ZHOU Junhui, YANG Jian
2025, 7(4):  95-107.  doi:10.12133/j.smartag.SA202505033
Asbtract ( 1431 )   HTML ( 1)   PDF (2379KB) ( 10 )  
Figures and Tables | References | Related Articles | Metrics

[Objective] This study aimed to develop a rapid and non-destructive method for determining the moisture content of hawthorn fruits using hyperspectral imaging (HSI) integrated with machine learning algorithms. By evaluating the effects of different fruit orientations and spectral ranges, the research provides theoretical insights and technical support for real-time moisture monitoring and intelligent fruit sorting. [Methods] A total of 458 fresh hawthorn samples, representing various regions and cultivars, were collected to ensure diversity and robustness. Hyperspectral images were acquired in two spectral ranges: visible-near-infrared (VNIR, 400~1 000 nm) and short-wave infrared (SWIR, 940~2 500 nm). A threshold segmentation algorithm was used to extract the region of interest (ROI) from each image, and the average reflectance spectrum of the ROI served as the raw input data. To enhance spectral quality and reduce noise, five preprocessing techniques were applied: Savitzky-Golay (SG) smoothing, multiplicative scatter correction (MSC), standard normal variate (SNV), first derivative (FD), and second derivative (SD). Four regression algorithms were then employed to build predictive models: partial least squares regression (PLSR), support vector regression (SVR), random forest (RF), and multilayer perceptron (MLP). The models were evaluated under varying fruit orientations (stem-side facing downward, upward, sideways, and a combined set of all three) and spectral ranges (VNIR, SWIR, and VNIR+SWIR). To further reduce the dimensionality of the hyperspectral data and minimize redundancy, four feature selection methods were applied: successive projections algorithm (SPA), competitive adaptive reweighted sampling (CARS), variable iterative space shrinkage approach (VISSA), and discrete wavelet transform combined with stepwise regression (DWT-SR). The DWT-SR method utilized the Daubechies 6 (db6) wavelet basis function at a decomposition level of 1. [Results and Discussions] Both fruit orientation and spectral range had a significant impact on model performance. The optimal prediction results were achieved when the stem-side of the fruit was facing downward, using the SWIR range (940~2 500 nm) and FD preprocessing. Under these conditions, the SVR model exhibited the highest predictive accuracy, with a coefficient of determination (R2ₚ) of 0.860 5, mean absolute error (MAEₚ) of 0.711 1, root mean square error (RMSEₚ) of 0.914 2, and residual prediction deviation (RPD) of 2.677 6. Further feature reduction using the DWT-SR method resulted in the selection of 17 key wavelengths. Despite the reduced input size, the SVR model based on these features maintained strong predictive capability, achieving R2ₚ = 0.857 1, MAEₚ = 0.669 2, RMSEₚ = 0.925 2, and RPD = 2.645 7. These findings confirm that the DWT-SR method effectively balances dimensionality reduction with model performance. The results demonstrate that the SWIR range contains more moisture-relevant spectral information than the VNIR range, and that first derivative preprocessing significantly improves the correlation between spectral features and moisture content. The SVR model proved particularly well-suited for handling nonlinear relationships in small datasets. Additionally, the DWT-SR method efficiently reduced data dimensionality while preserving key information, making it highly applicable for real-time industrial use. [Conclusions] In conclusion, hyperspectral imaging combined with appropriate preprocessing, feature selection, and machine learning techniques offers a promising and accurate approach for non-destructive moisture determination in hawthorn fruits. This method provides a valuable reference for quality control, moisture monitoring, and automated fruit sorting in the agricultural and food processing industries.

Non-destructive Detection of Apple Water Core Disease Based on Hyperspectral and X-ray CT Imaging |
YU Xinyuan, WANG Zhenjie, YOU Sicong, TU Kang, LAN Weijie, PENG Jing, ZHU Lixia, CHEN Tao, PAN Leiqing
2025, 7(4):  108-118.  doi:10.12133/j.smartag.SA202507022
Asbtract ( 1440 )   HTML ( 1)   PDF (2324KB) ( 15 )  
Figures and Tables | References | Related Articles | Metrics

[Objective] Apple "sugar-glazed core" (also known as watercore) is a common physiological disorder in apple fruits. Apples with watercore possess a distinctive flavor and are highly favored by consumers. However, severely affected apples are prone to mold growth during storage, posing potential food safety risks. Currently, the primary method for detecting sugar-glazed core in apple relies on manual destructive inspection, which is inefficient for large-scale applications and fails to meet the demands of modern automated and intelligent industrial production. To achieve rapid and non-destructive detection of apples with varying watercore severity levels, effective grading and soluble solids content (SSC) prediction models were developed in this study. [Methods] The Xinjiang Aksu Red Fuji apples were used as the research subject. A total of 230 apple samples were selected, comprising 113 normal, 61 mild, 47 moderate, and 9 severe watercore apples. The watercore severity was quantified through image processing of the apples' cross-sectional images. X-ray computed tomography (X-ray CT) data were acquired, and SSC values were measured. A hyperspectral imaging system was used to collect reflectance spectra within the 400~1 000 nm range. After performing black-and-white correction and selecting regions of interest (ROI), the Sample Set Partitioning based on Joint X-Y Distances (SPXY) algorithm was applied to divide the dataset into modeling (training) and prediction sets at a 3:1 ratio. Using the iToolbox in MATLAB, discriminant models were constructed based on partial least squares discriminant analysis (PLS-DA), support vector machine (SVM), and convolutional neural network (CNN) algorithms with reflectance spectral data as the input. Regression models for predicting SSC across different watercore severity levels were also established. Feature wavelength selection was carried out using competitive adaptive reweighted sampling (CARS), successive projections algorithm (SPA), and uninformative variable elimination (UVE) methods. [Results and Discussions] The results indicated that as watercore severity increased, the SSC of Red Fuji apples exhibited an upward trend. The average SSC values were 13.4% for normal apples, 14.9% for mild watercore apples, 15.0% for moderate watercore apples, and 16.0% for severe watercore apples. X-ray CT imaging revealed that the average tissue density of watercore-affected regions was higher than that of healthy tissues. Three-dimensional reconstruction algorithms allowed visualization of the internal spatial distribution of watercore tissues at different severity levels. The spatial volume proportions of watercore tissues were 3.92% in mild, 6.11% in moderate, and 10.23% in severe watercore apples. Apples with severe watercore demonstrated higher spectral reflectance. The PLS-DA-based grading model achieved accuracies of 98.7% in the training set and 95.9% in the test set. The model based on feature wavelengths selected by the UVE algorithm also showed high precision, with accuracies of 95.67% in the training set and 86.06% in the test set. For SSC regression modeling, the partial least squares regression (PLSR) model performed best, with a coefficient of determination for calibration (RC2) of 0.962, root mean square error of calibration (RMSEC) of 0.264, coefficient of determination for prediction (RP2) of 0.879, and root mean square error of prediction (RMSEP) of 0.435. The model based on feature wavelengths selected by the SPA algorithm exhibited further improved prediction performance, yielding RC2 0.846, RMSEC 0.532, RP2 0.792, RMSEP 0.576, coefficient of determination for cross-validation (RCV2) 0.781, and root mean square error of cross-validation (RMSECV) 0.637. [Conclusions] This study leveraged hyperspectral imaging and X-ray CT technologies to analyze differences in optical reflectance and microstructural characteristics of apple tissues across different watercore severity levels. The developed grading model effectively predicted watercore severity in apples, providing critical technical support for the development of intelligent post-harvest sorting equipment. The SSC regression model accurately predicted SSC values in apples with varying watercore severity, offering an efficient method for non-destructive detection and quality assessment of watercore-affected apples.

Acoustic-Vibration Detection Method for The Apple Moldy Core Disease Based on D-S Evidence Theory |
LIU Jie, ZHAO Kang, ZHAO Qinjun, SONG Ye
2025, 7(4):  119-131.  doi:10.12133/j.smartag.SA202505032
Asbtract ( 1545 )   HTML ( 3)   PDF (2371KB) ( 10 )  
Figures and Tables | References | Related Articles | Metrics

[Objective] Moldy core disease is a common internal disease of the apple and is highly infectious. In the early storage stage, the mold symptoms are confined to the interior of the core. The apple tissue is in a sub-healthy state and still has commercial value, so early detection of moldy-core apples is critical. [Methods] In this study, a non-destructive acoustic vibration detection system was used to acquire acoustic vibration response signals. Symmetrized dot pattern (SDP), gramian angular field (GAF), and stockwell transform (ST) were applied to obtain multi-domain acoustic vibration spectra, including SDP images, gramian angular summation field (GASF) images, gramian angular difference field (GADF) images, and ST images. These images were uniformly converted to grayscale, transforming the time-domain signals into multi-domain visual spectra to facilitate subsequent feature analysis and recognition. Uniform local binary pattern (ULBP) and gray-level-gradient co-occurrence matrix (GLGCM) methods were used to extract handcrafted features from the multi-domain visualized images. Subsequently, the maximum relevance and minimum redundancy (mRMR) criterion was applied to select the dominant features from each analysis domain that were sensitive to early disease information. Principal component analysis (PCA) was employed to reduce the dimensionality of the multi-domain spectral ULBP texture features. From the statistical features extracted from one-dimensional acoustic-vibration signals in the time and frequency domains, and the GLGCM texture features extracted from two-dimensional images in each domain, 5 to 8 features sensitive to early moldy core detection were selected. From the high-dimensional sensitive ULBP texture features extracted from each domain, 3 to 7 principal components were obtained through dimensionality reduction using principal component analysis. This selection aimed to maximize the relevance between features and class labels while minimizing redundancy among features, thereby identifying the most informative features for early mold core apple detection in each domain. Meanwhile, a ResNet50 feature extractor improved with a convolutional attention mechanism module and the Adam optimizer (Adam-IResNet50) was designed to automatically extract deep features from visualized images in each domain. The optimal shallow features were used to train a multiple support vector machine (MSVM) classifier, while the optimal deep features were used to train an extreme learning machine (ELM) classifier. The Adam-IResNet50 network was employed as a feature extractor. The deep features extracted from the time-domain and frequency-domain GADF images, as well as time-frequency images, resulted in higher sample matching scores (SC) and cluster compactness (CHS) values, along with lower inter-class overlap (DBI) values for the three apple categories. These results clearly indicate that the deep features extracted by the Adam-IResNet50 model from multi-domain images exhibit strong capability in identifying subhealth and moldy core apples. The preliminary outputs of the two classifiers were converted into basic probability assignments for independent evidence bodies. Dempster's combination rule and the associated decision criterion of Dempster-Shafer (D-S) theory were then applied to yield the final decision on early-stage moldy apples. Consequently, a decision-level fusion model was established for both shallow and deep features of the acoustic-vibration multi-domain spectra. [Results and Discussions] The constructed Adam-IResNet50-IPSO-ELM-DS model based on D-S evidence theory achieved a Kappa coefficient and Matthews Correlation Coefficient (MCC) slightly below 90% for multi-class classification of apples from known origins. The F1-Score and Overall Accuracy (OA) reached 93.01% and 93.22%, respectively. The classification accuracy for sub-healthy apples was 87.37%, while the misclassification rate for diseased apples was 8.33%. These results indicate that the model maintains a balanced precision and recall while achieving high detection accuracy for three classes of apples from unknown origins. After decision fusion, the IPSO-MSVM-DS and Adam-IResNet50-IPSO-ELM-DS models demonstrated significant performance improvements. Among them, the Adam-IResNet50-IPSO-ELM-DS model achieved an accuracy of 93.22%, which was significantly higher than that of other methods. This demonstrates that decision-level fusion could effectively enhance the model's discriminative ability and further improve classification performance. [Conclusions] The proposed acoustic vibration detection method for mold core apples, based on Dempster-Shafer evidence theory, provides technical support for future online batch detection of early mold core apples. Early screening of sub-healthy apples is of great significance for quality control during postharvest storage. In future work, the model will be further optimized to develop a rapid acoustic vibration-based prediction method for early detection of mold core, providing technical support for quality control during apple distribution.

Detection of Amylose in Fresh Corn Ears Based on Near-Infrared Spectroscopy |
XUE Zhicheng, ZHANG Yongli, ZHANG Jianxing, CHEN Fei, HUAN Kewei, ZHAO Baishun
2025, 7(4):  132-140.  doi:10.12133/j.smartag.SA202505030
Asbtract ( 1102 )   HTML ( 3)   PDF (1172KB) ( 9 )  
Figures and Tables | References | Related Articles | Metrics

[Objective] Fresh corn is increasingly an important choice in the daily diet of consumers due to its rich nutrition and sweet taste. With the improvement of living standards, people's quality requirements for fresh corn continue to improve, among which amylose content is a key indicator affecting the taste and flavor of corn, at present, the industry mainly uses chemical detection methods to determine amylose content, which is not only time-consuming and laborious, destroys samples, but also difficult to meet the needs of rapid detection in modern agricultural production and food processing. Therefore, the development of an efficient, accurate and non-destructive rapid detection technology for amylose has become a key issue in the field of agricultural product quality control. [Methods] In this study, a non-destructive detection model for amylose content in ears of fresh corn based on near-infrared spectroscopy technology was established. Taking Jinguan 597 fresh corn as the research object, the near-infrared spectroscopic detection system independently built by the laboratory was used to collect diffuse reflectance spectral data in the middle area of the complete corn ear to ensure that the detection process did not damage the integrity of the sample. At the same time, the physical and chemical values of amylose content in samples were determined with reference, and a standard database was established. In the data preprocessing stage, the Mahalanobis Distance method was used to screen the outliers of the original spectral data, and the abnormal samples caused by operating errors or sample defects were eliminated, and finally 90 representative fresh corn samples were retained for modeling analysis. In order to optimize the model performance, the effects of five mainstream spectral pretreatment methods were compared: standard normal variable (SNV) transform to eliminate the influence of optical path difference, multiplicative scatter correction (MSC) to reduce particle scattering interference, SavitZky-Golay smoothing (SGS) to remove random noise, first-order derivative (FD) to enhance spectral characteristic peaks, and detrending (DT) to eliminate baseline drift. Based on the partial least squares regression (PLSR) algorithm, a full-band amylose prediction model was constructed, and the robustness of the model was evaluated by cross-validation. In order to further improve the efficiency of model operation, the characteristic wavelengths with the strongest correlation with amylose content were selected from the whole spectrum by innovatively combining two variable selection methods, competitive adaptive reweighted sampling (CARS) and continuous successive projections algorithm (SPA), and a simplified characteristic band prediction model was established. [Results and Discussions] The results demonstrated that among the various combined models incorporating different preprocessing and feature wavelength selection methods, the "SNV-CARS-PLSR" model, which integrated SNV preprocessing with CARS feature extraction, exhibited superior performance. This model significantly outperformed alternative modeling approaches in predictive capability. The model achieved the following performance metrics: a calibration coefficient of determination (R2C) of 0.826, root mean square error of calibration (RMSEC) of 1.399, prediction coefficient of determination (R2P) of 0.820, root mean square error of prediction (RMSEP) of 1.081, and residual predictive deviation (RPD) of 2.426. Comparative analysis revealed that the "SNV-CARS-PLSR" model showed a 14.0% improvement in R2P compared to the full-band PLSR model with SNV preprocessing alone. This enhancement was primarily attributed to the CARS algorithm's effective identification of key feature wavelengths. Through its adaptive weighting and iterative optimization process, CARS successfully extracted 22 characteristic wavelengths that were strongly correlated with amylose content from the original 157 wavelength points in the full spectrum. This selective extraction process effectively eliminated redundant spectral information and noise interference, thereby significantly improving the model's predictive accuracy. [Conclusions] Combined SNV preprocessing with CARS feature selection, the study successfully established a rapid, non-destructive prediction model for amylose content in fresh maize ears utilizing near-infrared spectroscopy technology. The developed methodology demonstrated significant advantages, including rapid analysis capability and complete non destructiveness of samples. The reseach could provide technical support for rapid, non-destructive detection of amylose in fresh maize ears.

Overview Article
Embodied Intelligent Agricultural Robots: Key Technologies, Application Analysis, Challenges and Prospects |
WEI Peigang, CAO Shanshan, LIU Jifang, LIU Zhenhu, SUN Wei, KONG Fantao
2025, 7(4):  141-158.  doi:10.12133/j.smartag.SA202505008
Asbtract ( 248 )   HTML ( 17)   PDF (2592KB) ( 55 )  
Figures and Tables | References | Related Articles | Metrics

[Significance] Most current agricultural robots lack the ability to adapt to complex agricultural environments and still have limitations when facing variable, uncertain and unstructured agricultural scenarios. With the acceleration of agricultural intelligent transformation, embodied intelligence, as an intelligent system integrating environment perception, information cognition, autonomous decision-making and action, is giving agricultural robots stronger autonomous perception and complex environment adaptation ability, and becoming an important direction to promote the development of agricultural intelligent robots. In this paper, the technical system and application practice of embodied intelligence are sorted out systematically in the field of agricultural robots, its important value is revealed in improving environmental adaptability, decision-making autonomy and operational flexibility, and theoretical and practical references are provided to promote the development of agricultural robots to a higher level. [Progress] Firstly, the key supporting technologies of embodied intelligent agricultural robots are systematically sorted out, focusing on four aspects, namely, multimodal fusion perception, intelligent autonomous decision-making, autonomous action control and feedback autonomous learning. In terms of multimodal fusion perception, the modular artificial intelligence (AI) algorithm architecture and multimodal large model architecture are summarised. In terms of intelligent autonomous decision-making, two types of approaches based on artificial programming and dedicated task algorithms, and on large-scale pre-trained models are outlined. In terms of autonomous action control, three types of approaches based on the fusion of reinforcement learning and mainstream transformer, large model-assisted reinforcement learning, end-to-end mapping of semantics to action and action end-to-end mapping are summarised. In the area of feedback autonomous learning, the focus is on the related technological advances in the evolution of large model-driven feedback modules. Secondly, it analysed the typical application scenarios of embodied intelligence in agriculture, constructed a technical framework with "embodied perception - embodied cognition - embodied execution - embodied evolution" as the core, and discussed the implementation paths of each module according to the agricultural scenarios. The paths of each module are classified and discussed. Finally, the key technical bottlenecks and application challenges are analysed in depth, mainly including the high complexity of system integration, the significant gap between real and virtual data, and the limited ability of cross-scene generalisation. [Conclusions and Prospects] The future development trend of embodied intelligent agricultural robots is summarised and prospected from the construction of high-quality datasets and simulation platforms, the application of domain large model fusion, and the design of layered collaborative architectures, etc. It mainly focuses on the following aspects. Firstly, the construction of high-quality agricultural scenarios of embodied intelligence datasets is a key prerequisite to realise the embodied intelligence landing in agriculture. The development of embodied intelligent agricultural robots needs to rely on rich and accurate agricultural scene task datasets and highly realistic simulators to support physical interaction and behavioural learning. Secondly, the fusion of basic big model and agricultural domain model is the accelerator of intelligent perception and decision-making of agricultural robots. The in-depth fusion of general basic models in agricultural scenarios will bring stronger perception, understanding and reasoning capabilities to the embodied-intelligent agricultural robots. Thirdly, the "big model high-level planning + small model bottom-level control" architecture is an effective solution to balance intelligence and efficiency. Although large models have advantages in semantic understanding and global strategy planning, their reasoning latency and arithmetic demand can hardly meet the real-time and low-power requirements of agricultural robots. The use of large models for high-level task decomposition, scene semantic parsing and decision making, coupled with lightweight small models or traditional control algorithms to complete the underlying sensory response and motion control, can achieve the complementary advantages of the two.

Information Processing and Decision Making
Estimation of Maize Aboveground Biomass Based on CNN-LSTM-SA |
WANG Yi, XUE Rong, HAN Wenting, SHAO Guomin, HOU Yanqiao, CUI Xitong
2025, 7(4):  159-173.  doi:10.12133/j.smartag.SA202412004
Asbtract ( 150 )   HTML ( 4)   PDF (26335KB) ( 27 )  
Figures and Tables | References | Related Articles | Metrics

[Objective] Maize is one of the most widely cultivated staple crops worldwide, and its aboveground biomass (AGB) serves as a crucial indicator for evaluating crop growth status. Accurate estimation of maize AGB is vital for ensuring food security and enhancing agricultural productivity. However, maize AGB is influenced by a multitude of dynamic factors, exhibiting complex spatial and temporal variations that pose significant challenges to precise estimation. At present, most studies on maize AGB estimation rely primarily on single-source remote sensing data and conventional machine learning algorithms, which limits the accuracy and generalizability of the models. To overcome these limitations, a model architecture that integrates convolutional neural networks (CNN), long short-term memory networks (LSTM), and a self-attention (SA) mechanism was developed in this research to estimate maize AGB at the field scale. [Methods] The research utilized vegetation indices, crop parameters, and meteorological data that were collected under varying gradient water treatments in the experimental area. First, an optimized CNN-LSTM-SA model was constructed. The model employed two-dimensional convolutional layers to extract both spatial and temporal features, while utilizing max-pooling and dropout techniques to mitigate overfitting. The LSTM module was used to capture temporal dependencies in the data. The SA mechanism was introduced to compute global attention weights, enhancing the representation of critical time steps. Nonlinear activation functions were applied to mitigate multicollinearity among features. A fully connected layer was used to output the estimated AGB values. Second, the Pearson correlation coefficients between influencing factors and maize AGB were analyzed, and the importance of multi-source data was validated. recursive feature elimination (RFE) was used to select the optimal input features. The local interpretable model-agnostic explanations (LIME) method was employed to interpret individual samples. Finally, ablation experiments were conducted to assess the effects of incorporating CNN and SA into the model, with performance comparisons made against random forest (RF) and support vector machine (SVM) models. [Results and Discussions] The correlation analysis revealed that crop parameters exhibited strong correlations with AGB. Among the vegetation indices, the improved normalized difference red edge index (NDREI) demonstrated the highest correlation (r = 0.63). To address multicollinearity issues, the visible atmospherically resistant index (VARI), soil adjusted vegetation index (SAVI), and normalized difference red edge index (NDRE) were excluded from the analysis. The CNN-LSTM-SA model integrated crop parameters, vegetation indices, and meteorological data and initially achieved a coefficient of determination (R2) of 0.89, a root mean square error (RMSE) of 129.38 g/m2, and a mean absolute error (MAE) of 65.99 g/m2. When only vegetation indices and meteorological data were included, the model yielded an R2 of 0.83, an RMSE of 161.36 g/m2, and an MAE of 89.37 g/m2. Using a single vegetation index further reduced model accuracy. Based on multi-source data integration, RFE removed redundant features. After excluding the 2-meter average wind speed, the model reached its best performance with R2 of 0.92, RMSE of 107.53 g/m2, and MAE of 55.19 g/m2. Using the LIME method to interpret feature contributions for individual maize samples, the analysis revealed that during the rapid growth stage, the model was primarily influenced by the current growth status and vegetation indices. For samples in the mid-growth stage, multi-day crop physiological characteristics had a substantial impact on model predictions. In the late growth stage, higher vegetation index values showed a clear suppressive effect on the model outputs. During the mid-growth stage of maize under varying moisture conditions, the model consistently demonstrated heightened sensitivity to low temperatures, moderate humidity levels, and optimal vegetation indices. The CNN-LSTM-SA model demonstrated more consistent fitting performance and accuracy across different growth stages and water conditions compared to the LSTM, LSTM-SA, and CNN-LSTM models. Additionally, it also exceeded the performance of the RF model and the SVM model in all evaluation metrics. [Conclusions] This study leveraged the feature extraction capabilities of CNN, the temporal modeling strength of LSTM, and the dynamic attention mechanism of the SA to enhance the accuracy of maize AGB estimation from a spatiotemporal perspective. The approach not only reduced estimation errors but also improved model interpretability. This research could provide valuable insights and references for the dynamic modeling of crop AGB.

A Transfer Learning-Based Multimodal Model for Grape Detection and Counting |
XU Wenwen, YU Kejian, DAI Zexu, WU Yunzhi
2025, 7(4):  174-186.  doi:10.12133/j.smartag.SA202504005
Asbtract ( 120 )   HTML ( 9)   PDF (2533KB) ( 17 )  
Figures and Tables | References | Related Articles | Metrics

[Objective] As one of the world's largest cash crops in terms of total production value, grape has a yield whose accurate estimation is crucial for agricultural and economic development. However, at present, grape yield prediction is difficult and costly, detection of green grape varieties with similar colors of grape berries and grape leaves has limitations, and detection of grape bunches with small berries is ineffective. In order to solve the above problems, a multimodal detection framework is proposed based on transfer learning, which aims to realize the detection and counting of different varieties of grapes, so as to provide reliable technical support for grape yield prediction and intelligent management of orchards. [Methods] A multimodal grape detection framework based on transfer learning was proposed. This transfer learning utilized the feature representation capabilities of pretrained models, requiring only a small number of grape images for fine-tuning to adapt to the task. This approach not only reduced labeling costs but also enhanced the ability to capture grape features effectively. The multimodal framework adopted a dual-encoder-single-decoder structure, consisting of three core modules: the image and text feature extraction and enhancement module, the language-guided query selection module, and the cross-modality decoder module. In the feature extraction stage, the framework employed pretrained models from public datasets for transfer learning, which significantly reduced the training time and costs of the model on the target task while effectively improving the capability to capture grape features. By introducing a feature enhancement module, the framework achieved cross-modality fusion effects between grape images and text. Additionally, the attention mechanism was implemented to enhance both image and text features, facilitating cross-modality feature learning between images and text. During the cross-modality query selection phase, the framework utilized a language-guided query selection strategy that enabled the filtering of queries from grape images. This strategy allowed for a more effective use of input text to guide the object in target detection, selecting features that were more relevant to the input text as queries for the decoder. The cross-modality decoder combined the features from grape images and text modalities to achieve more accurate modality alignment, thereby facilitating a more effective fusion of grape image and text information, ultimately producing the corresponding grape prediction results. Finally, to comprehensively evaluate the model's performance, the mean average precision (mAP) and average recall (AR) were adopted as evaluation metrics for the detection task, while the counting task was quantified using the mean absolute error (MAE) and root mean square error (RMSE) as assessment indicators. [Results and Discussions] This method exhibited optimal performance in both detection and counting when compared to nine baseline models. Specifically, a comprehensive evaluation was conducted on the WGISD public dataset, where the method achieved an mAP50 of 80.3% in the detection task, representing a 2.7 percentage points improvement over the second-best model. Additionally, it reached 53.2% mAP and 58.2% mAP75, surpassing the second-best models by 13.4 and 22 percent points, respectively, and achieved an mAR of 76.5%, which was 9.8 percent points increase over the next best model. In the counting task, the method realized a MAE of 1.65 and an RMSE of 2.48, outperforming all other baseline models in counting effectiveness. Furthermore, experiments were conducted using a total of nine grape varieties from both the WGISD dataset and field-collected data, resulting in an mAP50 of 82.5%, an mAP of 58.5%, an mAP75 of 64.4%, an mAR of 77.1%, an MAE of 1.44, and an RMSE of 2.19. These results demonstrated the model's strong adaptability and effectiveness across diverse grape varieties. Notably, the method not only performed well in identifying large grape clusters but also showed superior performance on smaller grape clusters, achieving an mAP_s of 74.2% in the detection task, which was 9.5 percent points improvement over the second-best model. Additionally, to provide a more intuitive assessment of model performance, this study selected grape images from the test set for visual comparison analysis. The results revealed that the model's detection and counting outcomes for grape clusters closely aligned with the original annotation information from the label dataset. Overall, this method demonstrated strong generalization capabilities and higher accuracy under various environmental conditions for different grape varieties. This technology has the potential to be applied in estimating total orchard yield and reducing pre-harvest measurement errors, thereby effectively enhancing the precision management level of vineyards. [Conclusions] The proposed method achieved higher accuracy and better adaptability in detecting five grape varieties compared to other baseline models. Furthermore, the model demonstrated substantial practicality and robustness across nine different grape varieties. These findings suggested that the method developed in this study had significant application potential in grape detection and counting tasks. It could provide strong technical support for the intelligent development of precision agriculture and the grape cultivation industry, highlighting its promising prospects in enhancing agricultural practices.

Forecasting Method for China's Soybean Demand Based on Improved Temporal Fusion Transformers |
LIU Jiajia, QIN Xiaojing, LI Qianchuan, XU Shiwei, ZHAO Jichun, WANG Yigang, XIONG Lu, LIANG Xiaohe
2025, 7(4):  187-199.  doi:10.12133/j.smartag.SA202505017
Asbtract ( 124 )   HTML ( 9)   PDF (2217KB) ( 23 )  
Figures and Tables | References | Related Articles | Metrics

[Objective] Accurate prediction of soybean demand is of profound practical significance for safeguarding national food security, optimizing industrial decision-making, and responding to fluctuations in international trade. Traditional soybean demand forecasting methods are plagued by inadequacies such as limited capacity to excavate data dimensionality and multivariate interactive features, insufficient ability to capture nonlinear relationships under the coupling of multi-dimensional dynamic factors, and challenges in model interpretability and domain adaptability. These limitations render them incapable of effectively supporting accurate prediction and interpretable analysis of China's soybean demand. When the temporal fusion transformers (TFT) model is applied to forecast China's soybean demand, it exhibits certain constraints in aspects like feature interaction layers and attention weight allocation. Consequently, there is an urgent need to explore a forecasting method based on the improved TFT model to enhance the accuracy and interpretability of soybean demand prediction. [Methods] Drawing on relevant studies, this research applied the deep learning-based TFT model to China's soybean demand forecasting and proposed the MA-TFT (improved TFT model based on MDFI and AAWO) model, which was enhanced through multi-layer dynamic feature interaction (MDFI) and adaptive attention weight optimization (AAWO). Firstly, a dataset for analyzing China's soybean demand, covering eight dimensions: consumption, production, trade, inventory, market, economy, policy, and international factors, was collated. This dataset, encompassing 4 652 relevant indicators spanning from 1980 to 2024, was subjected to data cleaning, transformation, augmentation, and feature engineering. The training, validation, and test sets for the model were constructed using the rolling window method. Secondly, based on the architecture of the TFT model for China's soybean demand forecasting, a multi-layer dynamic feature interaction module and an adaptive attention weight optimization strategy were designed. Additionally, the model's loss function, training strategy, and Bayesian hyperparameter tuning method were formulated, and the model performance evaluation metrics were determined. Subsequently, experiments were designed to compare the prediction performance of the MA-TFT model with that of the autoregressive integrated moving average model (ARIMA), the long short-term memory (LSTM) model, and the original TFT model. Ablation experiments on the MDFI and AAWO modules were conducted separately. The SHapley Additive exPlanations (SHAP) tool was employed for interpretability analysis to identify key feature variables influencing China's soybean demand and their interaction relationships. Error analysis was performed between the predicted and actual values of China's historical soybean demand, and a comparative analysis of the predicted soybean demand in China from 2025 to 2034 was carried out. [Results and Discussions] The mean squared error (MSE) and mean absolute percentage error (MAPE) of the MA-TFT model were 0.036 and 5.89%, respectively, with a coefficient of determination R2 of 0.91, all of which outperformed those of the comparative models, namely ARIMA (1,1,1), LSTM, and TFT. Compared with the benchmark TFT model, the root mean square error (RMSE) and MAPE of the MA-TFT model decreased cumulatively by 21.84% and 3.44%, respectively. These results indicated that the MA-TFT model, as an improved version of TFT, could capture complex relationships between features and enhance prediction performance and accuracy. Interpretability analysis using the SHAP tool revealed that the MA-TFT model exhibited high stability in explaining key feature variables affecting China's soybean demand. It was projected that China's soybean demand would reach 117.99 million tons, 110.33 million tons, and 113.78 million tons in 2025, 2030, and 2034, respectively. [Conclusions] The MA-TFT model, developed by improving the TFT model, provides an innovative solution to address the practical issues of insufficient accuracy and poor interpretability in existing soybean demand forecasting methods. It also offers valuable references for method optimization and application in time series forecasting of other bulk agricultural products.

Authority in Charge: Ministry of Agriculture and Rural Affairs of the People’s Republic of China
Sponsor: Agricultural Information Institute, Chinese Academy of Agricultural Sciences
Editor-in-Chief: Chunjiang Zhao, Academician of Chinese Academy of Engineering.
ISSN 2097-485X(Online)
ISSN 2096-8094(Print)
CN 10-1681/S
CODEN ZNZHD7

Search by Issue
Search by Key words
Archive By Volume
Smart Agriculture Wechat
Visited
Links