欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture ›› 2022, Vol. 4 ›› Issue (4): 84-104.doi: 10.12133/j.smartag.SA202210004

• • 上一篇    下一篇

基于改进YOLOv5s和多模态图像的树上毛桃检测

  

  1. 1.安徽农业大学 信息与计算机学院,安徽 合肥 230036
    2.农业农村部农业传感器重点实验室,安徽 合肥 230036
    3.智慧农业技术与装备安徽省重点实验室,安徽 合肥 230036
  • 收稿日期:2022-10-30 出版日期:2022-12-30

Multi-Class on-Tree Peach Detection Using Improved YOLOv5s and Multi-Modal Images

LUO Qing1,2,3(), RAO Yuan1,2,3(), JIN Xiu1,2,3, JIANG Zhaohui1,2,3, WANG Tan1,2,3, WANG Fengyi1,2,3, ZHANG Wu1,2,3   

  1. 1.College of Information and Computer Science, Anhui Agricultural University, Hefei 230036, China
    2.Key Laboratory of Agricultural Sensors, Ministry of Agriculture and Rural Affairs, Hefei 230036, China
    3.Anhui Provincial Key Laboratory of Smart Agricultural Technology and Equipment, Hefei 230036, China
  • Received:2022-10-30 Online:2022-12-30
  • About author:LUO Qing (1997-), male, graduate student, research interest: smart agriculture. E-mail: tsing.omg@gmail.com
  • Supported by:
    The Anhui Provincial Key Laboratory of Smart Agricultural Technology and Equipment(APKLSATE2021X004);The International Cooperation Project of Ministry of Agriculture and Rural Affairs(125A0607);The Key Research and Development Plan of Anhui Province(201904a06020056);The Natural Science Major Project for Anhui Provincial University(2022AH040125);The Natural Science Foundation of Anhui Province, China(2008085MF203)

摘要:

毛桃等果实的准确检测是实现机械化、智能化农艺管理的必要前提。然而,由于光照不均和严重遮挡,在果园中实现毛桃,尤其是套袋毛桃的检测一直面临着挑战。本研究基于改进YOLOv5s和多模态视觉数据提出了面向机械化采摘的毛桃多分类准确检测。具体地,构建了一个多类标签的裸桃和套袋毛桃的RGB-D数据集,包括4127组由消费级RGB-D相机获取的像素对齐的彩色、深度和红外图像。随后,通过引入方向感知和位置敏感的注意力机制,提出了改进的轻量级YOLOv5s(小深度)模型,该模型可以沿一个空间方向捕捉长距离依赖,并沿另一个空间方向保留准确的位置信息,提高毛桃检测精度。同时,通过将卷积操作分解为深度方向的卷积与宽度、高度方向的卷积,使用深度可分离卷积在保持模型检测准确性的同时减少模型的计算量、训练和推理时间。实验结果表明,使用多模态视觉数据的改进YOLOv5s模型在复杂光照和严重遮挡环境下,对裸桃和套袋毛桃的平均精度(Mean Average Precision,mAP)分别为98.6%和88.9%,比仅使用RGB图像提高了5.3%和16.5%,比YOLOv5s提高了2.8%和6.2%。在套袋毛桃检测方面,改进YOLOv5s的mAP比YOLOX-Nano、PP-YOLO-Tiny和EfficientDet-D0分别提升了16.3%、8.1%和4.5%。此外,多模态图像、改进YOLOv5s对提升自然果园中的裸桃和套袋毛桃的准确检测均有贡献,所提出的改进YOLOv5s模型在检测公开数据集中的富士苹果和猕猴桃时,也获得了优于传统方法的结果,验证了所提出的模型具有良好的泛化能力。最后,在主流移动式硬件平台上,改进后的YOLOv5s模型使用五通道多模态图像时检测速度可达每秒19幅,能够实现毛桃的实时检测。上述结果证明了改进的YOLOv5s网络和含多类标签的多模态视觉数据在实现果实自动采摘系统视觉智能方面的应用潜力。

关键词: 多类检测, YOLOv5s, 多模态视觉数据, 机械化采摘, 深度学习

Abstract:

Accurate peach detection is a prerequisite for automated agronomic management, e.g., peach mechanical harvesting. However, due to uneven illumination and ubiquitous occlusion, it is challenging to detect the peaches, especially when the peaches are bagged in orchards. To this end, an accurate multi-class peach detection method was proposed by means of improving YOLOv5s and using multi-modal visual data for mechanical harvesting in this paper. RGB-D dataset with multi-class annotations of naked and bagging peach was proposed, including 4127 multi-modal images of corresponding pixel-aligned color, depth, and infrared images acquired with consumer-level RGB-D camera. Subsequently, an improved lightweight YOLOv5s (small depth) model was put forward by introducing a direction-aware and position-sensitive attention mechanism, which could capture long-range dependencies along one spatial direction and preserve precise positional information along the other spatial direction, helping the networks accurately detect peach targets. Meanwhile, the depthwise separable convolution was employed to reduce the model computation by decomposing the convolution operation into convolution in the depth direction and convolution in the width and height directions, which helped to speed up the training and inference of the network while maintaining accuracy. The comparison experimental results demonstrated that the improved YOLOv5s using multi-modal visual data recorded the detection mAP of 98.6% and 88.9% on the naked and bagging peach with 5.05 M model parameters in complex illumination and severe occlusion environment, increasing by 5.3% and 16.5% than only using RGB images, as well as by 2.8% and 6.2% when compared to YOLOv5s. As compared with other networks in detecting bagging peaches, the improved YOLOv5s performed best in terms of mAP, which was 16.3%, 8.1% and 4.5% higher than YOLOX-Nano, PP-YOLO-Tiny, and EfficientDet-D0, respectively. In addition, the proposed improved YOLOv5s model offered better results in different degrees than other methods in detecting Fuji apple and Hayward kiwifruit, verified the effectiveness on different fruit detection tasks. Further investigation revealed the contribution of each imaging modality, as well as the proposed improvement in YOLOv5s, to favorable detection results of both naked and bagging peaches in natural orchards. Additionally, on the popular mobile hardware platform, it was found out that the improved YOLOv5s model could implement 19 times detection per second with the considered five-channel multi-modal images, offering real-time peach detection. These promising results demonstrated the potential of the improved YOLOv5s and multi-modal visual data with multi-class annotations to achieve visual intelligence of automated fruit harvesting systems.

Key words: multi-class detection, YOLOv5s, multi-modal visual data, mechanical harvesting, deep learning

中图分类号: