欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture ›› 2024, Vol. 6 ›› Issue (5): 139-152.doi: 10.12133/j.smartag.SA202404002

• 技术方法 • 上一篇    下一篇

MSH-YOLOv8:融合尺度重建的蘑菇小目标检测方法

叶大鹏1,2, 景均1, 张之得1,2, 李辉煌1, 吴昊宇3, 谢立敏1,2()   

  1. 1. 福建农林大学机电工程学院,福建 福州 350002,中国
    2. 福建省农业信息感知技术重点实验室,福建 福州 350002,中国
    3. 福建农林大学未来技术学院,福建 福州 350002,中国
  • 收稿日期:2024-03-30 出版日期:2024-09-30
  • 基金项目:
    福建省林业科技项目(2023FKJ01)
  • 作者简介:
    叶大鹏,研究方向为农业生物环境监测与控制、山地农业机械性能设计与测试技术等。E-mail:
  • 通信作者:
    谢立敏,博士,讲师,研究方向为非线性系统动力学、机器人运动控制。E-mail:

MSH-YOLOv8: Mushroom Small Object Detection Method with Scale Reconstruction and Fusion

YE Dapeng1,2, JING Jun1, ZHANG Zhide1,2, LI Huihuang1, WU Haoyu3, XIE Limin1,2()   

  1. 1. College of Mechanical and Electrical Engineering, Fujian Agriculture and Forestry University, Fuzhou 350002, China
    2. Fujian Key Laboratory of Agricultural Information Sensoring Technology, Fuzhou 350002, China
    3. School of Future Technology, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China
  • Received:2024-03-30 Online:2024-09-30
  • Foundation items:Fujian Province Forestry Science and Technology Project(2023FKJ01)
  • About author:
    YE Dapeng, E-mail:
  • Corresponding author:
    XIE Limin, E-mail:

摘要:

【目的/意义】 为了解决图像尺寸变化和目标尺度变换共存对小目标检测精度的影响问题,本研究提出了一种新的检测模型:Multi-Strategy Handling YOLOv8(MSH-YOLOv8)。 【方法】 该模型在YOLOv8的基础上增加一个检测头,以提高小尺度目标敏感度;引入Swin Transformer的检测结构到头部网络,以减少计算冗余;引入包含可变形卷积的C2f_Deformable Convolutionv4(C2f_DCNv4)结构和Swin Transformer编码器结构重构YOLOv8主干网络,优化并增强其特征传递和提取能力,提高小目标敏感度;采用基于规范化的注意力模块(Normalization-based Attention Module, NAM)优化网络检测速度和准确性;用Wise-Intersection over Union Loss(WIoU)代替原损失函数,以提高训练效果和收敛速度;在后处理阶段应用分辨率动态训练、多尺度测试、软非极大值抑制算法(Soft-Non-Maximum Suppression, Soft-NMS)、加权边界框融合算法(Weighted Boxes Fusion, WBF)等方法,提高尺度变化下小目标检测效果。以蘑菇为研究对象,在开放数据集Fungi上开展实验。 【结果和讨论】 MSH-YOLOv8的平均正确率(Average Precision50, AP50)和AP@50-95分别达到了98.49%和75.29%,其中小目标检测指标值APs达39.73%。相较于主流模型YOLOv8,三项指标分别提高了2.34%,4.06%和8.55%;相较于优秀模型Transformer Prediction Heads-YOLOv5(TPH-YOLOv5),三项指标分别提高了2.14%,2.76%和6.89%。 【结论】 本研究提出的MSH-YOLOv8改进方法可在图像尺寸变化与目标尺度变化条件下有效提高小目标的检测效果。

关键词: 图像尺寸, 小目标检测, 特征提取, 多尺度检测, 模型集成

Abstract:

[Objective] Traditional object detection algorithms applied in the agricultural field, such as those used for crop growth monitoring and harvesting, often suffer from insufficient accuracy. This is particularly problematic for small crops like mushrooms, where recognition and detection are more challenging. The introduction of small object detection technology promises to address these issues, potentially enhancing the precision, efficiency, and economic benefits of agricultural production management. However, achieving high accuracy in small object detection has remained a significant challenge, especially when dealing with varying image sizes and target scales. Although the YOLO series models excel in speed and large object detection, they still have shortcomings in small object detection. To address the issue of maintaining high accuracy amid changes in image size and target scale, a novel detection model, Multi-Strategy Handling YOLOv8 (MSH-YOLOv8), was proposed. [Methods] The proposed MSH-YOLOv8 model builds upon YOLOv8 by incorporating several key enhancements aimed at improving sensitivity to small-scale targets and overall detection performance. Firstly, an additional detection head was added to increase the model's sensitivity to small objects. To address computational redundancy and improve feature extraction, the Swin Transformer detection structure was introduced into the input module of the head network, creating what was termed the "Swin Head (SH)". Moreover, the model integrated the C2f_Deformable convolutionv4 (C2f_DCNv4) structure, which included deformable convolutions, and the Swin Transformer encoder structure, termed "Swinstage", to reconstruct the YOLOv8 backbone network. This optimization enhanced feature propagation and extraction capabilities, increasing the network's ability to handle targets with significant scale variations. Additionally, the normalization-based attention module (NAM) was employed to improve performance without compromising detection speed or computational complexity. To further enhance training efficacy and convergence speed, the original loss function CIoU was replaced with wise-intersection over union (WIoU) Loss. Furthermore, experiments were conducted using mushrooms as the research subject on the open Fungi dataset. Approximately 200 images with resolution sizes around 600×800 were selected as the main research material, along with 50 images each with resolution sizes around 200×400 and 1 000×1 200 to ensure representativeness and generalization of image sizes. During the data augmentation phase, a generative adversarial network (GAN) was utilized for resolution reconstruction of low-resolution images, thereby preserving semantic quality as much as possible. In the post-processing phase, dynamic resolution training, multi-scale testing, soft non-maximum suppression (Soft-NMS), and weighted boxes fusion (WBF) were applied to enhance the model's small object detection capabilities under varying scales. [Results and Discussions] The improved MSH-YOLOv8 achieved an average precision at 50% (AP50) intersection over union of 98.49% and an AP@50-95 of 75.29%, with the small object detection metric APs reaching 39.73%. Compared to mainstream models like YOLOv8, these metrics showed improvements of 2.34%, 4.06% and 8.55%, respectively. When compared to the advanced TPH-YOLOv5 model, the improvements were 2.14%, 2.76% and 6.89%, respectively. The ensemble model, MSH-YOLOv8-ensemble, showed even more significant improvements, with AP50 and APs reaching 99.14% and 40.59%, respectively, an increase of 4.06% and 8.55% over YOLOv8. These results indicate the robustness and enhanced performance of the MSH-YOLOv8 model, particularly in detecting small objects under varying conditions. Further application of this methodology on the Alibaba Cloud Tianchi databases "Tomato Detection" and "Apple Detection" yielded MSH-YOLOv8-t and MSH-YOLOv8-a models (collectively referred to as MSH-YOLOv8). Visual comparison of detection results demonstrated that MSH-YOLOv8 significantly improved the recognition of dense and blurry small-scale tomatoes and apples. This indicated that the MSH-YOLOv8 method possesses strong cross-dataset generalization capability and effectively recognizes small-scale targets. In addition to quantitative improvements, qualitative assessments showed that the MSH-YOLOv8 model could handle complex scenarios involving occlusions, varying lighting conditions, and different growth stages of the crops. This demonstrates the practical applicability of the model in real-world agricultural settings, where such challenges are common. [Conclusions] The MSH-YOLOv8 improvement method proposed in this study effectively enhances the detection accuracy of small mushroom targets under varying image sizes and target scales. This approach leverages multiple strategies to optimize both the architecture and the training process, resulting in a robust model capable of high-precision small object detection. The methodology's application to other datasets, such as those for tomato and apple detection, further underscores its generalizability and potential for broader use in agricultural monitoring and management tasks.

Key words: image size, small object detection, feature extraction, multi-scale detection, model ensemble

中图分类号: