欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture ›› 2023, Vol. 5 ›› Issue (3): 17-34.doi: 10.12133/j.smartag.SA202306012

• 专刊--作物信息监测技术 • 上一篇    下一篇

农业病虫害图像数据集构建关键问题及评价方法综述

管博伦(), 张立平, 朱静波, 李闰枚, 孔娟娟, 汪焱, 董伟()   

  1. 安徽省农业科学院农业经济与信息研究所,安徽 合肥 230001,中国
  • 收稿日期:2023-06-13 出版日期:2023-09-30
  • 基金资助:
    国家自然科学基金面上项目(32171888); 安徽省财政农业科技成果转化项目(2022ZH001); 安徽省农业科学院科研计划项目(2023YL014)
  • 作者简介:
    管博伦,研究方向为数据挖掘、机器学习、农业信息化。E-mail:
  • 通信作者:
    董 伟,博士,副研究员,研究方向为农业信息化。E-mail:

The Key Issues and Evaluation Methods for Constructing Agricultural Pest and Disease Image Datasets: A Review

GUAN Bolun(), ZHANG Liping, ZHU Jingbo, LI Runmei, KONG Juanjuan, WANG Yan, DONG Wei()   

  1. Institute of Agricultural Economy and Information, Anhui Academy of Agricultural Sciences, Hefei 230001, China
  • Received:2023-06-13 Online:2023-09-30
  • Supported by:
    General Program of National Natural Science Foundation of China(32171888); Anhui Province Financial Agricultural Scientific and Technological Achievements Transformation Project(2022ZH001); Anhui Academy of Agricultural Sciences Research Platform Project(2023YL014)

摘要:

[目的/意义] 农业病虫害科学数据集是农业病虫害监测预警的基础,也是发展智慧农业重要的组成部分,对农业病虫害防治具有重要意义。随着深度学习技术在农业病虫害智能监测预警中应用效果的凸显,构建高质量的农业病虫害数据集逐步受到专家学者的重视。为了进一步构建高质量、分布均衡的农业病虫害图像数据集,提高检测模型的准确性和鲁棒性,本文以构建农业病虫害图像数据集面临的挑战为切入点,对农业病虫害数据集的构建进行了全面综述。 [进展] 分别从数据集层次、数据样本层次和使用层次总结构建农业病虫害图像数据集所面临的类间类内样本不均衡、选择偏差、目标多尺度、目标密集、数据分布不均、图像质量参差不齐、数据集规模不足以及数据集可用性等问题,从图像采集和标注方法两个方面,分析以上问题的主要成因,并归纳算法的改进策略和建议,最后总结了数据集相关评价方法。 [结论/展望] 结合农业病虫害图像识别实际需求,对构建高质量农业病虫害图像数据集提出了相关建议:(1)结合实际使用场景构建农业病虫害数据集。多视角、多环境下采集图像数据构建数据集,从算法提取特征的角度,科学、合理划分数据类别,构建样本数量分布和特征分布均衡的数据集;(2)平衡数据集与算法间的关系。研究数据集特征与算法性能之间的关系,需充分考虑数据集中的类别和分布,以及与模型匹配的数据集规模,以提高算法准确性、鲁棒性和实用性。深入研究农业病虫害图像数据规模与模型性能的关联关系、病虫害图像数据标注方法、模糊、密集、遮挡等目标的识别算法和高质量农业病虫害数据集评价指标,进一步提高农业病虫害智能化水平;(3)增强数据集的使用价值。构建多模态农业病虫害数据集,创新数据采集组织形式,开发数据中台,挖掘多模态数据间的关联性,提高数据使用便捷性,为应用落地、业务创新提供高效服务。

关键词: 农业病虫害, 数据集, 深度学习, 监测预警, 数据采集, 数据标注, 数据集评价

Abstract:

[Significance] The scientific dataset of agricultural pests and diseases is the foundation for monitoring and warning of agricultural pests and diseases. It is of great significance for the development of agricultural pest control, and is an important component of developing smart agriculture. The quality of the dataset affecting the effectiveness of image recognition algorithms, with the discovery of the importance of deep learning technology in intelligent monitoring of agricultural pests and diseases. The construction of high-quality agricultural pest and disease datasets is gradually attracting attention from scholars in this field. In the task of image recognition, on one hand, the recognition effect depends on the improvement strategy of the algorithm, and on the other hand, it depends on the quality of the dataset. The same recognition algorithm learns different features in different quality datasets, so its recognition performance also varies. In order to propose a dataset evaluation index to measure the quality of agricultural pest and disease datasets, this article analyzes the existing datasets and takes the challenges faced in constructing agricultural pest and disease image datasets as the starting point to review the construction of agricultural pest and disease datasets. [Progress] Firstly, disease and pest datasets are divided into two categories: private datasets and public datasets. Private datasets have the characteristics of high annotation quality, high image quality, and a large number of inter class samples that are not publicly available. Public datasets have the characteristics of multiple types, low image quality, and poor annotation quality. Secondly, the problems faced in the construction process of datasets are summarized, including imbalanced categories at the dataset level, difficulty in feature extraction at the dataset sample level, and difficulty in measuring the dataset size at the usage level. These include imbalanced inter class and intra class samples, selection bias, multi-scale targets, dense targets, uneven data distribution, uneven image quality, insufficient dataset size, and dataset availability. The main reasons for the problem are analyzed by two key aspects of image acquisition and annotation methods in dataset construction, and the improvement strategies and suggestions for the algorithm to address the above issues are summarized. The collection devices of the dataset can be divided into handheld devices, drone platforms, and fixed collection devices. The collection method of handheld devices is flexible and convenient, but it is inefficient and requires high photography skills. The drone platform acquisition method is suitable for data collection in contiguous areas, but the detailed features captured are not clear enough. The fixed device acquisition method has higher efficiency, but the shooting scene is often relatively fixed. The annotation of image data is divided into rectangular annotation and polygonal annotation. In image recognition and detection, rectangular annotation is generally used more frequently. It is difficult to label images that are difficult to separate the target and background. Improper annotation can lead to the introduction of more noise or incomplete algorithm feature extraction. In response to the problems in the above three aspects, the evaluation methods are summarized for data distribution consistency, dataset size, and image annotation quality at the end of the article. [Conclusions and Prospects] The future research and development suggestions for constructing high-quality agricultural pest and disease image datasets based are proposed on the actual needs of agricultural pest and disease image recognition:(1) Construct agricultural pest and disease datasets combined with practical usage scenarios. In order to enable the algorithm to extract richer target features, image data can be collected from multiple perspectives and environments to construct a dataset. According to actual needs, data categories can be scientifically and reasonably divided from the perspective of algorithm feature extraction, avoiding unreasonable inter class and intra class distances, and thus constructing a dataset that meets task requirements for classification and balanced feature distribution. (2) Balancing the relationship between datasets and algorithms. When improving algorithms, consider the more sufficient distribution of categories and features in the dataset, as well as the size of the dataset that matches the model, to improve algorithm accuracy, robustness, and practicality. It ensures that comparative experiments are conducted on algorithm improvement under the same evaluation standard dataset, and improved the pest and disease image recognition algorithm. Research the correlation between the scale of agricultural pest and disease image data and algorithm performance, study the relationship between data annotation methods and algorithms that are difficult to annotate pest and disease images, integrate recognition algorithms for fuzzy, dense, occluded targets, and propose evaluation indicators for agricultural pest and disease datasets. (3) Enhancing the use value of datasets. Datasets can not only be used for research on image recognition, but also for research on other business needs. The identification, collection, and annotation of target images is a challenging task in the construction process of pest and disease datasets. In the process of collecting image data, in addition to collecting images, attention can be paid to the collection of surrounding environmental information and host information. This method is used to construct a multimodal agricultural pest and disease dataset, fully leveraging the value of the dataset. In order to focus researchers on business innovation research, it is necessary to innovate the organizational form of data collection, develop a big data platform for agricultural diseases and pests, explore the correlation between multimodal data, improve the accessibility and convenience of data, and provide efficient services for application implementation and business innovation.

Key words: agricultural pests, data set, deep learning, monitoring and warning, data acquisition, data annotations, data set evaluation