Welcome to Smart Agriculture

Smart Agriculture ›› 2023, Vol. 5 ›› Issue (3): 17-34.doi: 10.12133/j.smartag.SA202306012

• Special Issue--Monitoring Technology of Crop Information • Previous Articles     Next Articles

The Key Issues and Evaluation Methods for Constructing Agricultural Pest and Disease Image Datasets: A Review

GUAN Bolun(), ZHANG Liping, ZHU Jingbo, LI Runmei, KONG Juanjuan, WANG Yan, DONG Wei()   

  1. Institute of Agricultural Economy and Information, Anhui Academy of Agricultural Sciences, Hefei 230001, China
  • Received:2023-06-13 Online:2023-09-30
  • corresponding author: DONG  Wei, E-mail:
  • About author:GUAN Bolun, E-mail:aaasguanbolun@163.com
  • Supported by:
    General Program of National Natural Science Foundation of China(32171888); Anhui Province Financial Agricultural Scientific and Technological Achievements Transformation Project(2022ZH001); Anhui Academy of Agricultural Sciences Research Platform Project(2023YL014)


[Significance] The scientific dataset of agricultural pests and diseases is the foundation for monitoring and warning of agricultural pests and diseases. It is of great significance for the development of agricultural pest control, and is an important component of developing smart agriculture. The quality of the dataset affecting the effectiveness of image recognition algorithms, with the discovery of the importance of deep learning technology in intelligent monitoring of agricultural pests and diseases. The construction of high-quality agricultural pest and disease datasets is gradually attracting attention from scholars in this field. In the task of image recognition, on one hand, the recognition effect depends on the improvement strategy of the algorithm, and on the other hand, it depends on the quality of the dataset. The same recognition algorithm learns different features in different quality datasets, so its recognition performance also varies. In order to propose a dataset evaluation index to measure the quality of agricultural pest and disease datasets, this article analyzes the existing datasets and takes the challenges faced in constructing agricultural pest and disease image datasets as the starting point to review the construction of agricultural pest and disease datasets. [Progress] Firstly, disease and pest datasets are divided into two categories: private datasets and public datasets. Private datasets have the characteristics of high annotation quality, high image quality, and a large number of inter class samples that are not publicly available. Public datasets have the characteristics of multiple types, low image quality, and poor annotation quality. Secondly, the problems faced in the construction process of datasets are summarized, including imbalanced categories at the dataset level, difficulty in feature extraction at the dataset sample level, and difficulty in measuring the dataset size at the usage level. These include imbalanced inter class and intra class samples, selection bias, multi-scale targets, dense targets, uneven data distribution, uneven image quality, insufficient dataset size, and dataset availability. The main reasons for the problem are analyzed by two key aspects of image acquisition and annotation methods in dataset construction, and the improvement strategies and suggestions for the algorithm to address the above issues are summarized. The collection devices of the dataset can be divided into handheld devices, drone platforms, and fixed collection devices. The collection method of handheld devices is flexible and convenient, but it is inefficient and requires high photography skills. The drone platform acquisition method is suitable for data collection in contiguous areas, but the detailed features captured are not clear enough. The fixed device acquisition method has higher efficiency, but the shooting scene is often relatively fixed. The annotation of image data is divided into rectangular annotation and polygonal annotation. In image recognition and detection, rectangular annotation is generally used more frequently. It is difficult to label images that are difficult to separate the target and background. Improper annotation can lead to the introduction of more noise or incomplete algorithm feature extraction. In response to the problems in the above three aspects, the evaluation methods are summarized for data distribution consistency, dataset size, and image annotation quality at the end of the article. [Conclusions and Prospects] The future research and development suggestions for constructing high-quality agricultural pest and disease image datasets based are proposed on the actual needs of agricultural pest and disease image recognition:(1) Construct agricultural pest and disease datasets combined with practical usage scenarios. In order to enable the algorithm to extract richer target features, image data can be collected from multiple perspectives and environments to construct a dataset. According to actual needs, data categories can be scientifically and reasonably divided from the perspective of algorithm feature extraction, avoiding unreasonable inter class and intra class distances, and thus constructing a dataset that meets task requirements for classification and balanced feature distribution. (2) Balancing the relationship between datasets and algorithms. When improving algorithms, consider the more sufficient distribution of categories and features in the dataset, as well as the size of the dataset that matches the model, to improve algorithm accuracy, robustness, and practicality. It ensures that comparative experiments are conducted on algorithm improvement under the same evaluation standard dataset, and improved the pest and disease image recognition algorithm. Research the correlation between the scale of agricultural pest and disease image data and algorithm performance, study the relationship between data annotation methods and algorithms that are difficult to annotate pest and disease images, integrate recognition algorithms for fuzzy, dense, occluded targets, and propose evaluation indicators for agricultural pest and disease datasets. (3) Enhancing the use value of datasets. Datasets can not only be used for research on image recognition, but also for research on other business needs. The identification, collection, and annotation of target images is a challenging task in the construction process of pest and disease datasets. In the process of collecting image data, in addition to collecting images, attention can be paid to the collection of surrounding environmental information and host information. This method is used to construct a multimodal agricultural pest and disease dataset, fully leveraging the value of the dataset. In order to focus researchers on business innovation research, it is necessary to innovate the organizational form of data collection, develop a big data platform for agricultural diseases and pests, explore the correlation between multimodal data, improve the accessibility and convenience of data, and provide efficient services for application implementation and business innovation.

Key words: agricultural pests, data set, deep learning, monitoring and warning, data acquisition, data annotations, data set evaluation