Loading...
Welcome to Smart Agriculture 中文

Table of Content

    30 March 2024, Volume 6 Issue 2
    Special Issue--Agricultural Information Perception and Models
    Big Models in Agriculture: Key Technologies, Application and Future Directions | Open Access
    GUO Wang, YANG Yusen, WU Huarui, ZHU Huaji, MIAO Yisheng, GU Jingqiu
    2024, 6(2):  1-13.  doi:10.12133/j.smartag.SA202403015
    Asbtract ( 1738 )   HTML ( 291)   PDF (1482KB) ( 1521 )  
    Figures and Tables | References | Related Articles | Metrics

    [Significance] Big Models, or Foundation Models, have offered a new paradigm in smart agriculture. These models, built on the Transformer architecture, incorporate numerous parameters and have undergone extensive training, often showing excellent performance and adaptability, making them effective in addressing agricultural issues where data is limited. Integrating big models in agriculture promises to pave the way for a more comprehensive form of agricultural intelligence, capable of processing diverse inputs, making informed decisions, and potentially overseeing entire farming systems autonomously. [Progress] The fundamental concepts and core technologies of big models are initially elaborated from five aspects: the generation and core principles of the Transformer architecture, scaling laws of extending big models, large-scale self-supervised learning, the general capabilities and adaptions of big models, and the emerging capabilities of big models. Subsequently, the possible application scenarios of the big model in the agricultural field are analyzed in detail, the development status of big models is described based on three types of the models: Large language models (LLMs), large vision models (LVMs), and large multi-modal models (LMMs). The progress of applying big models in agriculture is discussed, and the achievements are presented. [Conclusions and Prospects] The challenges and key tasks of applying big models technology in agriculture are analyzed. Firstly, the current datasets used for agricultural big models are somewhat limited, and the process of constructing these datasets can be both expensive and potentially problematic in terms of copyright issues. There is a call for creating more extensive, more openly accessible datasets to facilitate future advancements. Secondly, the complexity of big models, due to their extensive parameter counts, poses significant challenges in terms of training and deployment. However, there is optimism that future methodological improvements will streamline these processes by optimizing memory and computational efficiency, thereby enhancing the performance of big models in agriculture. Thirdly, these advanced models demonstrate strong proficiency in analyzing image and text data, suggesting potential future applications in integrating real-time data from IoT devices and the Internet to make informed decisions, manage multi-modal data, and potentially operate machinery within autonomous agricultural systems. Finally, the dissemination and implementation of these big models in the public agricultural sphere are deemed crucial. The public availability of these models is expected to refine their capabilities through user feedback and alleviate the workload on humans by providing sophisticated and accurate agricultural advice, which could revolutionize agricultural practices.

    Intelligent Identification of Crop Agronomic Traits and Morphological Structure Phenotypes: A Review | Open Access
    ZHANG Jianhua, YAO Qiong, ZHOU Guomin, WU Wendi, XIU Xiaojie, WANG Jian
    2024, 6(2):  14-27.  doi:10.12133/j.smartag.SA202401015
    Asbtract ( 1128 )   HTML ( 55)   PDF (1376KB) ( 610 )  
    Figures and Tables | References | Related Articles | Metrics

    [Significance] The crop phenotype is the visible result of the complex interplay between crop genes and the environment. It reflects the physiological, ecological, and dynamic aspects of crop growth and development, serving as a critical component in the realm of advanced breeding techniques. By systematically analyzing crop phenotypes, researchers can gain valuable insights into gene function and identify genetic factors that influence important crop traits. This information can then be leveraged to effectively harness germplasm resources and develop breakthrough varieties. Utilizing data-driven, intelligent, dynamic, and non-invasive methods for measuring crop phenotypes allows researchers to accurately capture key growth traits and parameters, providing essential data for breeding and selecting superior crop varieties throughout the entire growth cycle. This article provides an overview of intelligent identification technologies for crop agronomic traits and morphological structural phenotypes. [Progress] Crop phenotype acquisition equipment serves as the essential foundation for acquiring, analyzing, measuring, and identifying crop phenotypes. This equipment enables detailed monitoring of crop growth status. The article presents an overview of the functions, performance, and applications of the leading high-throughput crop phenotyping platforms, as well as an analysis of the characteristics of various sensing and imaging devices used to obtain crop phenotypic information. The rapid advancement of high-throughput crop phenotyping platforms and sensory imaging equipment has facilitated the integration of cutting-edge imaging technology, spectroscopy technology, and deep learning algorithms. These technologies enable the automatic and high-throughput acquisition of yield, resistance, quality, and other relevant traits of large-scale crops, leading to the generation of extensive multi-dimensional, multi-scale, and multi-modal crop phenotypic data. This advancement supports the rapid progression of crop phenomics. The article also discusses the research progress of intelligent recognition technologies for agronomic traits such as crop plant height acquisition, crop organ detection, and counting, as well as crop ideotype recognition, crop morphological information measurement, and crop three-dimensional reconstruction for morphological structure intelligent recognition. Furthermore, this article outlines the main challenges faced in this field, including: difficulties in data collection in complex environments, high requirements for data scale, diversity, and preprocessing, the need to improve the lightweight nature and generalization ability of models, as well as the high cost of data collection equipment and the need to enhance practicality. [Conclusions and Prospects] Finally, this article puts forward the development directions of crop phenotype intelligent recognition technology, including: developing new and low cost intelligent field equipment for acquiring and analyzing crop phenotypes, enhancing the standardization and consistency of field crop phenotype acquisition, strengthening the generality of intelligent crop phenotype recognition models, researching crop phenotype recognition methods that involve multi-perspective, multimodal, multi-point continuous analysis, and spatiotemporal feature fusion, as well as improving model interpretability.

    Identification and Severity Classification of Typical Maize Foliar Diseases Based on Hyperspectral Data | Open Access
    SHEN Yanyan, ZHAO Yutao, CHEN Gengshen, LYU Zhengang, ZHAO Feng, YANG Wanneng, MENG Ran
    2024, 6(2):  28-39.  doi:10.12133/j.smartag.SA202310016
    Asbtract ( 1046 )   HTML ( 47)   PDF (1519KB) ( 167 )  
    Figures and Tables | References | Related Articles | Metrics

    [Objective] In recent years, there has been a significant increase in the severity of leaf diseases in maize, with a noticeable trend of mixed occurrence. This poses a serious threat to the yield and quality of maize. However, there is a lack of studies that combine the identification of different types of leaf diseases and their severity classification, which cannot meet the needs of disease prevention and control under the mixed occurrence of different diseases and different severities in actual maize fields. [Methods] A method was proposed for identifying the types of typical leaf diseases in maize and classifying their severity using hyperspectral technology. Hyperspectral data of three leaf diseases of maize: northern corn leaf blight (NCLB), southern corn leaf blight (SCLB) and southern corn rust (SCR), were obtained through greenhouse pathogen inoculation and natural inoculation. The spectral data were preprocessed by spectral standardization, SG filtering, sensitive band extraction and vegetation index calculation, to explore the spectral characteristics of the three leaf diseases of maize. Then, the inverse frequency weighting method was utilized to balance the number of samples to reduce the overfitting phenomenon caused by sample imbalance. Relief-F and variable selection using random forests (VSURF) method were employed to optimize the sensitive spectral features, including band features and vegetation index features, to construct models for disease type identification based on the full stages of disease development (including all disease severities) and for individual disease severities using several representative machine learning approaches, demonstrating the effectiveness of the research method. Furthermore, the study individual occurrence severity classification models were also constructed for each single maize leaf disease, including the NCLB, SCLB and SCR severity classification models, respectively, aiming to achieve full-process recognition and disease severity classification for different leaf diseases. Overall accuracy (OA) and Macro F1 were used to evaluate the model accuracy in this study. Results and Discussion The research results showed significant spectrum changes of three kinds of maize leaf diseases primarily focusing on the visible (550-680 nm), red edge (740-760 nm), near-infrared (760-1 000 nm) and shortwave infrared (1 300-1 800 nm) bands. Disease-specific spectral features, optimized based on disease spectral response rules, effectively identified disease species and classify their severity. Moreover, vegetation index features were more effective in identifying disease-specific information than sensitive band features. This was primarily due to the noise and information redundancy present in the selected hyperspectral sensitive bands, whereas vegetation index could reduce the influence of background and atmospheric noise to a certain extent by integrating relevant spectral signals through band calculation, so as to achieve higher precision in the model. Among several machine learning algorithms, the support vector machine (SVM) method exhibited better robustness than random forest (RF) and decision tree (DT). In the full stage of disease development, the optimal overall accuracy (OA) of the disease classification model constructed by SVM based on vegetation index reached 77.51%, with a Macro F1 of 0.77, representing a 28.75% increase in OA and 0.30 higher of Macro F1 compared to the model based on sensitive bands. Additionally, the accuracy of the disease classification model with a single severity of the disease increased with the severity of the disease. The accuracy of disease classification during the early stage of disease development (OA=70.31%) closely approached that of the full disease development stage (OA=77.51%). Subsequently, in the moderate disease severity stage, the optimal accuracy of disease classification (OA=80.00%) surpassed the optimal accuracy of disease classification in the full disease development stage. Furthermore, the optimal accuracy of disease classification under severe severity reached 95.06%, with a Macro F1 of 0.94. This heightened accuracy during the severity stage can be attributed to significant changes in pigment content, water content and cell structure of the diseased leaves, intensifying the spectral response of each disease and enhancing the differentiation between different diseases. In disease severity classification model, the optimal accuracy of the three models for maize leaf disease severity all exceeded 70%. Among the three kinds of disease severity classification results, the NCLB severity classification model exhibited the best performance. The NCLB severity classification model, utilizing SVM based on the optimal vegetation index features, achieved an OA of 86.25%, with a Macro F1 of 0.85. In comparison, the accuracy of the SCLB severity classification model (OA=70.35%, Macro F1=0.70) and SCR severity classification model (OA=71.39%, Macro F1=0.69) were lower than that of NCLB. [Conclusions] The aforementioned results demonstrate the potential to effectively identify and classify the types and severity of common leaf diseases in maize using hyperspectral data. This lays the groundwork for research and provides a theoretical basis for large-scale crop disease monitoring, contributing to precision prevention and control as well as promoting green agriculture.

    Oilseed Rape Sclerotinia in Hyperspectral Images Segmentation Method Based on Bi-GRU and Spatial-Spectral Information Fusion | Open Access
    ZHANG Jing, ZHAO Zexuan, ZHAO Yanru, BU Hongchao, WU Xingyu
    2024, 6(2):  40-48.  doi:10.12133/j.smartag.SA202310010
    Asbtract ( 984 )   HTML ( 30)   PDF (1594KB) ( 124 )  
    Figures and Tables | References | Related Articles | Metrics

    [Objective] The widespread prevalence of sclerotinia disease poses a significant challenge to the cultivation and supply of oilseed rape, not only results in substantial yield losses and decreased oil content in infected plant seeds but also severely impacts crop productivity and quality, leading to significant economic losses. To solve the problems of complex operation, environmental pollution, sample destruction and low detection efficiency of traditional chemical detection methods, a Bi-directional Gate Recurrent Unit (Bi-GRU) model based on space-spectrum feature fusion was constructed to achieve hyperspectral images (HSIs) segmentation of oilseed rape sclerotinia infected area. [Methods] The spectral characteristics of sclerotinia disease from a spectral perspective was initially explored. Significantly varying spectral reflectance was notably observed around 550 nm and within the wavelength range of 750-1 000 nm at different locations on rapeseed leaves. As the severity of sclerotinia infection increased, the differences in reflectance at these wavelengths became more pronounced. Subsequently, a rapeseed leaf sclerotinia disease dataset comprising 400 HSIs was curated using an intelligent data annotation tool. This dataset was divided into three subsets: a training set with 280 HSIs, a validation set with 40 HSIs, and a test set with 80 HSIs. Expanding on this, a 7×7 pixel neighborhood was extracted as the spatial feature of the target pixel, incorporating both spatial and spectral features effectively. Leveraging the Bi-GRU model enabled simultaneous feature extraction at any point within the sequence data, eliminating the impact of the order of spatial-spectral data fusion on the model's performance. The model comprises four key components: an input layer, hidden layers, fully connected layers, and an output layer. The Bi-GRU model in this study consisted of two hidden layers, each housing 512 GRU neurons. The forward hidden layer computed sequence information at the current time step, while the backward hidden layer retrieves the sequence in reverse, incorporating reversed-order information. These two hidden layers were linked to a fully connected layer, providing both forward and reversed-order information to all neurons during training. The Bi-GRU model included two fully connected layers, each with 1 000 neurons, and an output layer with two neurons representing the healthy and diseased classes, respectively. [Results and Discussions] To thoroughly validate the comprehensive performance of the proposed Bi-GRU model and assess the effectiveness of the spatial-spectral information fusion mechanism, relevant comparative analysis experiments were conducted. These experiments primarily focused on five key parameters—ClassAP(1), ClassAP(2), mean average precision (mAP), mean intersection over union (mIoU), and Kappa coefficient—to provide a comprehensive evaluation of the Bi-GRU model's performance. The comprehensive performance analysis revealed that the Bi-GRU model, when compared to mainstream convolutional neural network (CNN) and long short-term memory (LSTM) models, demonstrated superior overall performance in detecting rapeseed sclerotinia disease. Notably, the proposed Bi-GRU model achieved an mAP of 93.7%, showcasing a 7.1% precision improvement over the CNN model. The bidirectional architecture, coupled with spatial-spectral fusion data, effectively enhanced detection accuracy. Furthermore, the study visually presented the segmentation results of sclerotinia disease-infected areas using CNN, Bi-LSTM, and Bi-GRU models. A comparison with the Ground-Truth data revealed that the Bi-GRU model outperformed the CNN and Bi-LSTM models in detecting sclerotinia disease at various infection stages. Additionally, the Dice coefficient was employed to comprehensively assess the actual detection performance of different models at early, middle, and late infection stages. The dice coefficients for the Bi-GRU model at these stages were 83.8%, 89.4% and 89.2%, respectively. While early infection detection accuracy was relatively lower, the spatial-spectral data fusion mechanism significantly enhanced the effectiveness of detecting early sclerotinia infections in oilseed rape. [Conclusions] This study introduces a Bi-GRU model that integrates spatial and spectral information to accurately and efficiently identify the infected areas of oilseed rape sclerotinia disease. This approach not only addresses the challenge of detecting early stages of sclerotinia infection but also establishes a basis for high-throughput non-destructive detection of the disease.

    Crop Pest Target Detection Algorithm in Complex Scenes:YOLOv8-Extend | Open Access
    ZHANG Ronghua, BAI Xue, FAN Jiangchuan
    2024, 6(2):  49-61.  doi:10.12133/j.smartag.SA202311007
    Asbtract ( 1325 )   HTML ( 103)   PDF (2287KB) ( 1788 )  
    Figures and Tables | References | Related Articles | Metrics

    [Objective] It is of great significance to improve the efficiency and accuracy of crop pest detection in complex natural environments, and to change the current reliance on expert manual identification in the agricultural production process. Targeting the problems of small target size, mimicry with crops, low detection accuracy, and slow algorithm reasoning speed in crop pest detection, a complex scene crop pest target detection algorithm named YOLOv8-Entend was proposed in this research. [Methods] Firstly, the GSConv was introduecd to enhance the model's receptive field, allowing for global feature aggregation. This mechanism enables feature aggregation at both node and global levels simultaneously, obtaining local features from neighboring nodes through neighbor sampling and aggregation operations, enhancing the model's receptive field and semantic understanding ability. Additionally, some Convs were replaced with lightweight Ghost Convolutions and HorBlock was utilized to capture longer-term feature dependencies. The recursive gate convolution employed gating mechanisms to remember and transmit previous information, capturing long-term correlations. Furthermore, Concat was replaced with BiFPN for richer feature fusion. The bidirectional fusion of depth features from top to bottom and from bottom to top enhances the transmission of feature information acrossed different network layers. Utilizing the VoVGSCSP module, feature maps of different scales were connected to create longer feature map vectors, increasing model diversity and enhancing small object detection. The convolutional block attention module (CBAM) attention mechanism was introduced to strengthen features of field pests and reduce background weights caused by complexity. Next, the Wise IoU dynamic non-monotonic focusing mechanism was implemented to evaluate the quality of anchor boxes using "outlier" instead of IoU. This mechanism also included a gradient gain allocation strategy, which reduced the competitiveness of high-quality anchor frames and minimizes harmful gradients from low-quality examples. This approach allowed WIoU to concentrate on anchor boxes of average quality, improving the network model's generalization ability and overall performance. Subsequently, the improved YOLOv8-Extend model was compared with the original YOLOv8 model, YOLOv5, YOLOv8-GSCONV, YOLOv8-BiFPN, and YOLOv8-CBAM to validate the accuracy and precision of model detection. Finally, the model was deployed on edge devices for inference verification to confirm its effectiveness in practical application scenarios. [Results and Discussions] The results indicated that the improved YOLOv8-Extend model achieved notable improvements in accuracy, recall, mAP@0.5, and mAP@0.5:0.95 evaluation indices. Specifically, there were increases of 2.6%, 3.6%, 2.4% and 7.2%, respectively, showcasing superior detection performance. YOLOv8-Extend and YOLOv8 run respectively on the edge computing device JETSON ORIN NX 16 GB and were accelerated by TensorRT, mAP@0.5 improved by 4.6%, FPS reached 57.6, meeting real-time detection requirements. The YOLOv8-Extend model demonstrated better adaptability in complex agricultural scenarios and exhibited clear advantages in detecting small pests and pests sharing similar growth environments in practical data collection. The accuracy in detecting challenging data saw a notable increased of 11.9%. Through algorithm refinement, the model showcased improved capability in extracting and focusing on features in crop pest target detection, addressing issues such as small targets, similar background textures, and challenging feature extraction. [Conclusions] The YOLOv8-Extend model introduced in this study significantly boosts detection accuracy and recognition rates while upholding high operational efficiency. It is suitable for deployment on edge terminal computing devices to facilitate real-time detection of crop pests, offering technological advancements and methodologies for the advancement of cost-effective terminal-based automatic pest recognition systems. This research can serve as a valuable resource and aid in the intelligent detection of other small targets, as well as in optimizing model structures.

    Shrimp Diseases Detection Method Based on Improved YOLOv8 and Multiple Features | Open Access
    XU Ruifeng, WANG Yaohua, DING Wenyong, YU Junqi, YAN Maocang, CHEN Chen
    2024, 6(2):  62-71.  doi:10.12133/j.smartag.SA201311014
    Asbtract ( 1164 )   HTML ( 40)   PDF (1597KB) ( 437 )  
    Figures and Tables | References | Related Articles | Metrics

    [Objective] In recent years, there has been a steady increase in the occurrence and fatality rates of shrimp diseases, causing substantial impacts in shrimp aquaculture. These diseases are marked by their swift onset, high infectivity, complex control requirements, and elevated mortality rates. With the continuous growth of shrimp factory farming, traditional manual detection approaches are no longer able to keep pace with the current requirements. Hence, there is an urgent necessity for an automated solution to identify shrimp diseases. The main goal of this research is to create a cost-effective inspection method using computer vision that achieves a harmonious balance between cost efficiency and detection accuracy. The improved YOLOv8 (You Only Look Once) network and multiple features were employed to detect shrimp diseases. [Methods] To address the issue of surface foam interference, the improved YOLOv8 network was applied to detect and extract surface shrimps as the primary focus of the image. This target detection approach accurately recognizes objects of interest in the image, determining their category and location, with extraction results surpassing those of threshold segmentation. Taking into account the cost limitations of platform computing power in practical production settings, the network was optimized by reducing parameters and computations, thereby improving detection speed and deployment efficiency. Additionally, the Farnberck optical flow method and gray level co-occurrence matrix (GLCM) were employed to capture the movement and image texture features of shrimp video clips. A dataset was created using these extracted multiple feature parameters, and a Support Vector Machine (SVM) classifier was trained to categorize the multiple feature parameters in video clips, facilitating the detection of shrimp health. [Results and Discussions] The improved YOLOv8 in this study effectively enhanced detection accuracy without increasing the number of parameters and flops. According to the results of the ablation experiment, replacing the backbone network with FasterNet lightweight backbone network significantly reduces the number of parameters and computation, albeit at the cost of decreased accuracy. However, after integrating the efficient multi-scale attention (EMA) on the neck, the mAP0.5 increased by 0.3% compared to YOLOv8s, while mAP0.95 only decreased by 2.1%. Furthermore, the parameter count decreased by 45%, and FLOPs decreased by 42%. The improved YOLOv8 exhibits remarkable performance, ranking second only to YOLOv7 in terms of mAP0.5 and mAP0.95, with respective reductions of 0.4% and 0.6%. Additionally, it possesses a significantly reduced parameter count and FLOPS compared to YOLOv7, matching those of YOLOv5. Despite the YOLOv7-Tiny and YOLOv8-VanillaNet models boasting lower parameters and Flops, their accuracy lags behind that of the improved YOLOv8. The mAP0.5 and mAP0.95 of YOLOv7-Tiny and YOLOv8-VanillaNet are 22.4%, 36.2%, 2.3%, and 4.7% lower than that of the improved YOLOv8, respectively. Using a support vector machine (SVM) trained on a comprehensive dataset incorporating multiple feature, the classifier achieved an impressive accuracy rate of 97.625%. The 150 normal fragments and the 150 diseased fragments were randomly selected as test samples. The classifier exhibited a detection accuracy of 89% on this dataset of the 300 samples. This result indicates that the combination of features extracted using the Farnberck optical flow method and GLCM can effectively capture the distinguishing dynamics of movement speed and direction between infected and healthy shrimp. In this research, the majority of errors stem from the incorrect recognition of diseased segments as normal segments, accounting for 88.2% of the total error. These errors can be categorized into three main types: 1) The first type occurs when floating foam obstructs the water surface, resulting in a small number of shrimp being extracted from the image. 2) The second type is attributed to changes in water movement. In this study, nanotubes were used for oxygenation, leading to the generation of sprays on the water surface, which affected the movement of shrimp. 3) The third type of error is linked to video quality. When the video's pixel count is low, the difference in optical flow between diseased shrimp and normal shrimp becomes relatively small. Therefore, it is advisable to adjust the collection area based on the actual production environment and enhance video quality. [Conclusions] The multiple features introduced in this study effectively capture the movement of shrimp, and can be employed for disease detection. The improved YOLOv8 is particularly well-suited for platforms with limited computational resources and is feasible for deployment in actual production settings. However, the experiment was conducted in a factory farming environment, limiting the applicability of the method to other farming environments. Overall, this method only requires consumer-grade cameras as image acquisition equipment and has lower requirements on the detection platform, and can provide a theoretical basis and methodological support for the future application of aquatic disease detection methods.

    Zero-Shot Pest Identification Based on Generative Adversarial Networks and Visual-Semantic Alignment | Open Access
    LI Tianjun, YANG Xinting, CHEN Xiao, HU Huan, ZHOU Zijie, LI Wenyong
    2024, 6(2):  72-84.  doi:10.12133/j.smartag.SA202312014
    Asbtract ( 923 )   HTML ( 30)   PDF (2294KB) ( 134 )  
    Figures and Tables | References | Related Articles | Metrics

    [Objective] Accurate identification of insect pests is crucial for the effective prevention and control of crop infestations. However, existing pest identification methods primarily rely on traditional machine learning or deep learning techniques that are trained on seen classes. These methods falter when they encounter unseen pest species not included in the training set, due to the absence of image samples. An innovative method was proposed to address the zero-shot recognition challenge for pests. [Methods] The novel zero-shot learning (ZSL) method proposed in this study was capable of identifying unseen pest species. First, a comprehensive pest image dataset was assembled, sourced from field photography conducted around Beijing over several years, and from web crawling. The final dataset consisted of 2 000 images across 20 classes of adult Lepidoptera insects, with 100 images per class. During data preprocessing, a semantic dataset was manually curated by defining attributes related to color, pattern, size, and shape for six parts: antennae, back, tail, legs, wings, and overall appearance. Each image was annotated to form a 65-dimensional attribute vector for each class, resulting in a 20×65 semantic attribute matrix with rows representing each class and columns representing attribute values. Subsequently, 16 classes were designated as seen classes, and 4 as unseen classes. Next, a novel zero-shot pest recognition method was proposed, focusing on synthesizing high-quality pseudo-visual features aligned with semantic information using a generator. The wasserstein generative adversarial networks (WGAN) architecture was strategically employed as the fundamental network backbone. Conventional generative adversarial networks (GANs) have been known to suffer from training instabilities, mode collapse, and convergence issues, which can severely hinder their performance and applicability. The WGAN architecture addresses these inherent limitations through a principled reformulation of the objective function. In the proposed method, the contrastive module was designed to capture highly discriminative visual features that could effectively distinguish between different insect classes. It operated by creating positive and negative pairs of instances within a batch. Positive pairs consisted of different views of the same class, while negative pairs were formed from instances belonging to different classes. The contrastive loss function encouraged the learned representations of positive pairs to be similar while pushing the representations of negative pairs apart. Tightly integrated with the WGAN structure, this module substantially improved the generation quality of the generator. Furthermore, the visual-semantic alignment module enforced consistency constraints from both visual and semantic perspectives. This module constructed a cross-modal embedding space, mapping visual and semantic features via two projection layers: One for mapping visual features into the cross-modal space, and another for mapping semantic features. The visual projection layer took the synthesized pseudo-visual features from the generator as input, while the semantic projection layer ingested the class-level semantic vectors. Within this cross-modal embedding space, the module enforced two key constraints: Maximizing the similarity between same-class visual-semantic pairs and minimizing the similarity between different-class pairs. This was achieved through a carefully designed loss function that encourages the projected visual and semantic representations to be closely aligned for instances belonging to the same class, while pushing apart the representations of different classes. The visual-semantic alignment module acted as a regularizer, preventing the generator from producing features that deviated from the desired semantic information. This regularization effect complemented the discriminative power gained from the contrastive module, resulting in a generator that produces high-quality, diverse, and semantically aligned pseudo-visual features. [Results and Discussions] The proposed method was evaluated on several popular ZSL benchmarks, including CUB, AWA, FLO, and SUN. The results demonstrated that the proposed method achieved state-of-the-art performance across these datasets, with a maximum improvement of 2.8% over the previous best method, CE-GZSL. This outcome fully demonstrated the method's broad effectiveness in different benchmarks and its outstanding generalization ability. On the self-constructed 20-class insect dataset, the method also exhibited exceptional recognition accuracy. Under the standard ZSL setting, it achieved a precise recognition rate of 77.4%, outperforming CE-GZSL by 2.1%. Under the generalized ZSL setting, it achieved a harmonic mean accuracy of 78.3%, making a notable 1.2% improvement. This metric provided a balanced assessment of the model's performance across seen and unseen classes, ensuring that high accuracy on unseen classes does not come at the cost of forgetting seen classes. These results on the pest dataset, coupled with the performance on public benchmarks, firmly validated the effectiveness of the proposed method. [Conclusions] The proposed zero-shot pest recognition method represents a step forward in addressing the challenges of pest management. It effectively generalized pest visual features to unseen classes, enabling zero-shot pest recognition. It can facilitate pests identification tasks that lack training samples, thereby assisting in the discovery and prevention of novel crop pests. Future research will focus on expanding the range of pest species to further enhance the model's practical applicability.

    Agricultural Disease Named Entity Recognition with Pointer Network Based on RoFormer Pre-trained Model | Open Access
    WANG Tong, WANG Chunshan, LI Jiuxi, ZHU Huaji, MIAO Yisheng, WU Huarui
    2024, 6(2):  85-94.  doi:10.12133/j.smartag.SA202311021
    Asbtract ( 938 )   HTML ( 24)   PDF (1219KB) ( 245 )  
    Figures and Tables | References | Related Articles | Metrics

    [Objective] With the development of agricultural informatization, a large amount of information about agricultural diseases exists in the form of text. However, due to problems such as nested entities and confusion of entity types, traditional named entities recognition (NER) methods often face challenges of low accuracy when processing agricultural disease text. To address this issue, this study proposes a new agricultural disease NER method called RoFormer-PointerNet, which combines the RoFormer pre-trained model with the PointerNet baseline model. The aim of this method is to improve the accuracy of entity recognition in agricultural disease text, providing more accurate data support for intelligent analysis, early warning, and prevention of agricultural diseases. [Methods] This method first utilized the RoFormer pre-trained model to perform deep vectorization processing on the input agricultural disease text. This step was a crucial foundation for the subsequent entity extraction task. As an advanced natural language processing model, the RoFormer pre-trained model's unique rotational position embedding approach endowed it with powerful capabilities in capturing textual positional information. In agricultural disease text, due to the diversity of terminology and the existence of polysemy, traditional entity recognition methods often faced challenges in confusing entity types. However, through its unique positional embedding mechanism, the RoFormer model was able to incorporate more positional information into the vector representation, effectively enriching the feature information of words. This characteristic enabled the model to more accurately distinguish between different entity types in subsequent entity extraction tasks, reducing the possibility of type confusion. After completing the vectorization representation of the text, this study further emploied a pointer network for entity extraction. The pointer network was an advanced sequence labeling approach that utilizes head and tail pointers to annotate entities within sentences. This labeling method was more flexible compared to traditional sequence labeling methods as it was not restricted by fixed entity structures, enabling the accurate extraction of all types of entities within sentences, including complex entities with nested relationships. In agricultural disease text, entity extraction often faced the challenge of nesting, such as when multiple different entity types are nested within a single disease symptom description. By introducing the pointer network, this study effectively addressed this issue of entity nesting, improving the accuracy and completeness of entity extraction. [Results and Discussions] To validate the performance of the RoFormer-PointerNet method, this study constructed an agricultural disease dataset, which comprised 2 867 annotated corpora and a total of 10 282 entities, including eight entity types such as disease names, crop names, disease characteristics, pathogens, infected areas, disease factors, prevention and control methods, and disease stages. In comparative experiments with other pre-trained models such as Word2Vec, BERT, and RoBERTa, RoFormer-PointerNet demonstrated superiority in model precision, recall, and F1-Score, achieving 87.49%, 85.76% and 86.62%, respectively. This result demonstrated the effectiveness of the RoFormer pre-trained model. Additionally, to verify the advantage of RoFormer-PointerNet in mitigating the issue of nested entities, this study compared it with the widely used bidirectional long short-term memory neural network (BiLSTM) and conditional random field (CRF) models combined with the RoFormer pre-trained model as decoding methods. RoFormer-PointerNet outperformed the RoFormer-BiLSTM, RoFormer-CRF, and RoFormer-BiLSTM-CRF models by 4.8%, 5.67% and 3.87%, respectively. The experimental results indicated that RoFormer-PointerNet significantly outperforms other models in entity recognition performance, confirming the effectiveness of the pointer network model in addressing nested entity issues. To validate the superiority of the RoFormer-PointerNet method in agricultural disease NER, a comparative experiment was conducted with eight mainstream NER models such as BiLSTM-CRF, BERT-BiLSTM-CRF, and W2NER. The experimental results showed that the RoFormer-PointerNet method achieved precision, recall, and F1-Score of 87.49%, 85.76% and 86.62%, respectively in the agricultural disease dataset, reaching the optimal level among similar methods. This result further verified the superior performance of the RoFormer-PointerNet method in agricultural disease NER tasks. [Conclusions] The agricultural disease NER method RoFormer-PointerNet, proposed in this study and based on the RoFormer pre-trained model, demonstrates significant advantages in addressing issues such as nested entities and type confusion during the entity extraction process. This method effectively identifies entities in Chinese agricultural disease texts, enhancing the accuracy of entity recognition and providing robust data support for intelligent analysis, early warning, and prevention of agricultural diseases. This research outcome holds significant importance for promoting the development of agricultural informatization and intelligence.

    Fast Extracting Method for Strawberry Leaf Age and Canopy Width Based on Instance Segmentation Technology | Open Access
    FAN Jiangchuan, WANG Yuanqiao, GOU Wenbo, CAI Shuangze, GUO Xinyu, ZHAO Chunjiang
    2024, 6(2):  95-106.  doi:10.12133/j.smartag.SA202310014
    Asbtract ( 623 )   HTML ( 27)   PDF (1903KB) ( 179 )  
    Figures and Tables | References | Related Articles | Metrics

    [Objective] There's a growing demand among plant cultivators and breeders for efficient methods to acquire plant phenotypic traits at high throughput, facilitating the establishment of mappings from phenotypes to genotypes. By integrating mobile phenotyping platforms with improved instance segmentation techniques, researchers have achieved a significant advancement in the automation and accuracy of phenotypic data extraction. Addressing the need for rapid extraction of leaf age and canopy width phenotypes in strawberry plants cultivated in controlled environments, this study introduces a novel high-throughput phenotyping extraction approach leveraging a mobile phenotyping platform and instance segmentation technology. [Methods] Data acquisition was conducted using a compact mobile phenotyping platform equipped with an array of sensors, including an RGB sensor, and edge control computers, capable of capturing overhead images of potted strawberry plants in greenhouses. Targeted adjustments to the network structure were made to develop an enhanced convolutional neural network (Mask R-CNN) model for processing strawberry plant image data and rapidly extracting plant phenotypic information. The model initially employed a split-attention networks (ResNeSt) backbone with a group attention module, replacing the original network to improve the precision and efficiency of image feature extraction. During training, the model adopted the Mosaic method, suitable for instance segmentation data augmentation, to expand the dataset of strawberry images. Additionally, it optimized the original cross-entropy classification loss function with a binary cross-entropy loss function to achieve better detection accuracy of plants and leaves. Based on this, the improved Mask R-CNN description involves post-processing of training results. It utilized the positional relationship between leaf and plant masks to statistically count the number of leaves. Additionally, it employed segmentation masks and image calibration against true values to calculate the canopy width of the plant. [Results and Discussions] This research conducted a thorough evaluation and comparison of the performance of an improved Mask R-CNN model, underpinned by the ResNeSt-101 backbone network. This model achieved a commendable mask accuracy of 80.1% and a detection box accuracy of 89.6%. It demonstrated the ability to efficiently estimate the age of strawberry leaves, demonstrating a high plant detection rate of 99.3% and a leaf count accuracy of 98.0%. This accuracy marked a significant improvement over the original Mask R-CNN model and meeting the precise needs for phenotypic data extraction. The method displayed notable accuracy in measuring the canopy widths of strawberry plants, with errors falling below 5% in about 98.1% of cases, highlighting its effectiveness in phenotypic dimension evaluation. Moreover, the model operated at a speed of 12.9 frames per second (FPS) on edge devices, effectively balancing accuracy and operational efficiency. This speed proved adequate for real-time applications, enabling rapid phenotypic data extraction even on devices with limited computational capabilitie. [Conclusions] This study successfully deployed a mobile phenotyping platform combined with instance segmentation techniques to analyze image data and extract various phenotypic indicators of strawberry plant. Notably, the method demonstrates remarkable robustness. The seamless fusion of mobile platforms and advanced image processing methods not only enhances efficiency but also ignifies a shift towards data-driven decision-making in agriculture.

    Transplant Status Detection Algorithm of Cabbage in the Field Based on Improved YOLOv8s | Open Access
    WU Xiaoyan, GUO Wei, ZHU Yiping, ZHU Huaji, WU Huarui
    2024, 6(2):  107-117.  doi:10.12133/j.smartag.SA202401008
    Asbtract ( 294 )   HTML ( 55)   PDF (3310KB) ( 221 )  
    Figures and Tables | References | Related Articles | Metrics

    [Objective] Currently, the lack of computerized systems to monitor the quality of cabbage transplants is a notable shortcoming in the agricultural industry, where transplanting operations play a crucial role in determining the overall yield and quality of the crop. To address this problem, a lightweight and efficient algorithm was developed to monitor the status of cabbage transplants in a natural environment. [Methods] First, the cabbage image dataset was established, the cabbage images in the natural environment were collected, the collected image data were filtered and the transplanting status of the cabbage was set as normal seedling (upright and intact seedling), buried seedling (whose stems and leaves were buried by the soil) and exposed seedling (whose roots were exposed), and the dataset was manually categorized and labelled using a graphical image annotation tool (LabelImg) so that corresponding XML files could be generated. And the dataset was pre-processed with data enhancement methods such as flipping, cropping, blurring and random brightness mode to eliminate the scale and position differences between the cabbages in the test and training sets and to improve the imbalance of the data. Then, a cabbage transplantation state detection model based on YOLOv8s (You Only Look Once Version 8s) was designed. To address the problem that light and soil have a large influence on the identification of the transplantation state of cabbage in the natural environment, a multi-scale attention mechanism was embedded to increase the number of features in the model, and a multi-scale attention mechanism was embedded to increase the number of features in the model. Embedding the multi-scale attention mechanism to increase the algorithm's attention to the target region and improve the network's attention to target features at different scales, so as to improve the model's detection efficiency and target recognition accuracy, and reduce the leakage rate; by combining with deformable convolution, more useful target information was captured to improve the model's target recognition and convergence effect, and the model complexity increased by C3-layer convolution was reduced, which further reduced the model complexity. Due to the unsatisfactory localization effect of the algorithm, the focal extended intersection over union loss (Focal-EIoU Loss) was introduced to solve the problem of violent oscillation of the loss value caused by low-quality samples, and the influence weight of high-quality samples on the loss value was increased while the influence of low-quality samples was suppressed, so as to improve the convergence speed and localization accuracy of the algorithm. [Results and Discussions] Eventually, the algorithm was put through a stringent testing phase, yielding a remarkable recognition accuracy of 96.2% for the task of cabbage transplantation state. This was an improvement of 2.8% over the widely used YOLOv8s. Moreover, when benchmarked against other prominent target detection models, the algorithm emerged as a clear winner. It showcased a notable enhancement of 3% and 8.9% in detection performance compared to YOLOv3-tiny. Simultaneously, it also managed to achieve a 3.7% increase in the recall rate, a metric that measured the efficiency of the algorithm in identifying actual targets among false positives. On a comparative note, the algorithm outperformed YOLOv5 in terms of recall rate by 1.1%, 2% and 1.5%, respectively. When pitted against the robust faster region-based convolutional neural network (Faster R-CNN), the algorithm demonstrated a significant boost in recall rate by 20.8% and 11.4%, resulting in an overall improvement of 13%. A similar trend was observed when the algorithm was compared to the single shot multibox detector (SSD) model, with a notable 9.4% and 6.1% improvement in recall rate. The final experimental results show that when the enhanced model was compared with YOLOv7-tiny, the recognition accuracy was increased by 3%, and the recall rate was increased by 3.5%. These impressive results validated the superiority of the algorithm in terms of accuracy and localization ability within the target area. The algorithm effectively eliminates interferenced factors such as soil and background impurities, thereby enhancing its performance and making it an ideal choice for tasks such as cabbage transplantation state recognition. [Conclusions] The experimental results show that the proposed cabbage transplantation state detection method can meet the accuracy and real-time requirements for the identification of cabbage transplantation state, and the detection accuracy and localization accuracy of the improved model perform better when the target is smaller and there are weeds and other interferences in the background. Therefore, the method proposed in this study can improve the efficiency of cabbage transplantation quality measurement, reduce the time and labor, and improve the automation of field transplantation quality survey.

    Grading Method of Fresh Cut Rose Flowers Based on Improved YOLOv8s | Open Access
    ZHANG Yuyu, BING Shuying, JI Yuanhao, YAN Beibei, XU Jinpu
    2024, 6(2):  118-127.  doi:10.12133/j.smartag.SA202401005
    Asbtract ( 262 )   HTML ( 27)   PDF (2060KB) ( 164 )  
    Figures and Tables | References | Related Articles | Metrics

    [Objective] The fresh cut rose industry has shown a positive growth trend in recent years, demonstrating sustained development. Considering the current fresh cut roses grading process relies on simple manual grading, which results in low efficiency and accuracy, a new model named Flower-YOLOv8s was proposed for grading detection of fresh cut roses. [Methods] The flower head of a single rose against a uniform background was selected as the primary detection target. Subsequently, fresh cut roses were categorized into four distinct grades: A, B, C, and D. These grades were determined based on factors such as color, size, and freshness, ensuring a comprehensive and objective grading system. A novel dataset contenting 778 images was specifically tailored for rose fresh-cut flower grading and detection was constructed. This dataset served as the foundation for our subsequent experiments and analysis. To further enhance the performance of the YOLOv8s model, two cutting-edge attention convolutional block attention module (CBAM) and spatial attention module (SAM) were introduced separately for comparison experiments. These modules were seamlessly integrated into the backbone network of the YOLOv8s model to enhance its ability to focus on salient features and suppressing irrelevant information. Moreover, selecting and optimizing the SAM module by reducing the number of convolution kernels, incorporating a depth-separable convolution module and reducing the number of input channels to improve the module's efficiency and contribute to reducing the overall computational complexity of the model. The convolution layer (Conv) in the C2f module was replaced by the depth separable convolution (DWConv), and then combined with Optimized-SAM was introduced into the C2f structure, giving birth to the Flower-YOLOv8s model. Precision, recall and F1 score were used as evaluation indicators. [Results and Discussions] Ablation results showed that the Flower-YOLOv8s model proposed in this study, namely YOLOv8s+DWConv+Optimized-SAM, the recall rate was 95.4%, which was 3.8% higher and the average accuracy, 0.2% higher than that of YOLOv8s with DWConv alone. When compared to the baseline model YOLOv8s, the Flower-YOLOv8s model exhibited a remarkable 2.1% increase in accuracy, reaching a peak of 97.4%. Furthermore, mAP was augmented by 0.7%, demonstrating the model's superior performance across various evaluation metrics. The effectiveness of adding Optimized-SAM was proved. From the overall experimental results, the number of parameters of Flower-YOLOv8s was reduced by 2.26 M compared with the baseline model YOLOv8s, and the reasoning time was also reduced from 15.6 to 5.7 ms. Therefore, the Flower-YOLOv8s model was superior to the baseline model in terms of accuracy rate, average accuracy, number of parameters, detection time and model size. The performances of Flower-YOLOv8s network were compared with other target detection algorithms of Fast-RCNN, Faster-RCNN and first-stage target detection models of SSD, YOLOv3, YOLOv5s and YOLOv8s to verify the superiority under the same condition and the same data set. The average precision values of the Flower-YOLOv8s model proposed in this study were 2.6%, 19.4%, 6.5%, 1.7%, 1.9% and 0.7% higher than those of Fast-RCNN, Faster-RCNN, SSD, YOLOv3, YOLOv5s and YOLOv8s, respectively. Compared with YOLOv8s with higher recall rate, Flower-YOLOv8s reduced model size, inference time and parameter number by 4.5 MB, 9.9 ms and 2.26 M, respectively. Notably, the Flower-YOLOv8s model achieved these improvements while simultaneously reducing model parameters and computational complexity. [Conclusions] The Flower-YOLOv8s model not only demonstrated superior detection accuracy but also exhibited a reduction in model parameters and computational complexity. This lightweight yet powerful model is highly suitable for real-time applications, making it a promising candidate for flower grading and detection tasks in the agricultural and horticultural industries.

    HI-FPN: A Hierarchical Interactive Feature Pyramid Network for Accurate Wheat Lodging Localization Across Multiple Growth Periods | Open Access
    PANG Chunhui, CHEN Peng, XIA Yi, ZHANG Jun, WANG Bing, ZOU Yan, CHEN Tianjiao, KANG Chenrui, LIANG Dong
    2024, 6(2):  128-139.  doi:10.12133/j.smartag.SA202310002
    Asbtract ( 198 )   HTML ( 19)   PDF (2072KB) ( 138 )  
    Figures and Tables | References | Related Articles | Metrics

    [Objective] Wheat lodging is one of the key isuess threatening stable and high yields. Lodging detection technology based on deep learning generally limited to identifying lodging at a single growth stage of wheat, while lodging may occur at various stages of the growth cycle. Moreover, the morphological characteristics of lodging vary significantly as the growth period progresses, posing a challenge to the feature capturing ability of deep learning models. The aim is exploring a deep learning-based method for detecting wheat lodging boundaries across multiple growth stages to achieve automatic and accurate monitoring of wheat lodging. [Methods] A model called Lodging2Former was proposed, which integrates the innovative hierarchical interactive feature pyramid network (HI-FPN ) on top of the advanced segmentation model Mask2Former. The key focus of this network design lies in enhancing the fusion and interaction between feature maps at adjacent hierarchical levels, enabling the model to effectively integrate feature information at different scales. Building upon this, even in complex field backgrounds, the Lodging2Former model significantly enhances the recognition and capturing capabilities of wheat lodging features at multiple growth stages. [Results and Discussions] The Lodging2Former model demonstrated superiority in mean average precision (mAP) compared to several mainstream algorithms such as mask region-based convolutional neural network (Mask R-CNN), segmenting objects by locations (SOLOv2), and Mask2Former. When applied to the scenario of detecting lodging in mixed growth stage wheat, the model achieved mAP values of 79.5%, 40.2%, and 43.4% at thresholds of 0.5, 0.75, and 0.5 to 0.95, respectively. Compared to Mask2Former, the performance of the improved model was enhanced by 1.3% to 4.3%. Compared to SOLOv2, a growth of 9.9% to 30.7% in mAP was achieved; and compared to the classic Mask R-CNN, a significant improvement of 24.2% to 26.4% was obtained. Furthermore, regardless of the IoU threshold standard, the Lodging2Former exhibited the best detection performance, demonstrating good robustness and adaptability in the face of potential influencing factors such as field environment changes. [Conclusions] The experimental results indicated that the proposed HI-FPN network could effectively utilize contextual semantics and detailed information in images. By extracting rich multi-scale features, it enabled the Lodging2Former model to more accurately detect lodging areas of wheat across different growth stages, confirming the potential and value of HI-FPN in detecting lodging in multi-growth-stage wheat.

    Three-Dimensional Dynamic Growth and Yield Simulation Model of Daylily Plants | Open Access
    ZHANG Yue, LI Weijia, HAN Zhiping, ZHANG Kun, LIU Jiawen, HENKE Michael
    2024, 6(2):  140-153.  doi:10.12133/j.smartag.SA202310011
    Asbtract ( 252 )   HTML ( 27)   PDF (2579KB) ( 279 )  
    Figures and Tables | References | Related Articles | Metrics

    [Objective] The daylily, a perennial herb in the lily family, boasts a rich nutritional profile. Given its economic importance, enhancing its yield is a crucial objective. However, current research on daylily cultivation is limited, especially regarding three-dimensional dynamic growth simulation of daylily plants. In order to establish a technological foundation for improved cultivation management, growth dynamics prediction, and the development of plant variety types in daylily crops, this study introduces an innovative three-dimensional dynamic growth and yield simulation model for daylily plants. [Methods] The open-source GroIMP software platform was used to simulate and visualize three-dimensional scenes. With Datong daylily, the primary cultivated variety of daylily in the Datong area, as the research subject, a field experiment was conducted from March to September 2022, which covered the growth season of daylily. Through actual cultivation experiment measurements, morphological data and leaf photosynthetic physiological parameters of daylily leaves, flower stems, flower buds, and other organs were collected. The functional-structural plant model (FSPM) platform's three-dimensional modeling technology was employed to establish the Cloud Cover-based solar radiation models (CSRMs) and the Farquhar, von Camerer, and Berry model (FvCB model) suitable for daylily. Moreover, based on the source-sink relationship of daylily, the carbon allocation model of daylily photosynthetic products was developed. By using the β growth function, the growth simulation model of daylily organs was constructed, and the daily morphological data of daylily during the growth period were calculated, achieving the three-dimensional dynamic growth and yield simulation of daylily plants. Finally, the model was validated with measured data. [Results and Discussions] The coefficient of determination (R2) between the measured and simulated outdoor surface solar radiation was 0.87, accompanied by a Root Mean Squared Error (RMSE) of 28.52 W/m2. For the simulated model of each organ of the daylily plant, the R2 of the measured against the predicted values ranged from 0.896 to 0.984, with an RMSE varying between 1.4 and 17.7 cm. The R2 of the average flower bud yield simulation was 0.880, accompanied by an RMSE of 0.5 g. The overall F-value spanned from 82.244 to 1 168.533, while the Sig. value was consistently below the 0.05 significance level, suggesting a robust fit and statistical significance for the aforementioned models. Subsequently, a thorough examination of the light interaction, temperature influences, and photosynthetic attributes of daylily leaves throughout their growth cycle was carried out. The findings revealed that leaf nutrition growth played a pivotal role in the early phase of daylily's growth, followed by the contribution of leaf and flower stem nutrition in the middle stage, and finally the growth of daylily flower buds, which is the crucial period for yield formation, in the later stages. Analyzing the photosynthetic traits of daylily leaves comprehensively, it was observed that the photosynthetic rate was relatively low in the early spring as the new leaves were initially emerging and reached a plateau during the summer. Considering real-world climate conditions, the actual net photosynthetic rate was marginally lower than the rate verified under optimal conditions, with the simulated net assimilation rate typically ranging from 2 to 4 μmol CO2/(m2·s). [Conclusions] The three-dimensional dynamic growth model of daylily plants proposed in this study can faithfully articulate the growth laws and morphological traits of daylily plants across the three primary growth stages. This model not only illustrates the three-dimensional dynamic growth of daylily plants but also effectively mimics the yield data of daylily flower buds. The simulation outcomes concur with actual conditions, demonstrating a high level of reliability.