MSH-YOLOv8：融合尺度重建的蘑菇小目标检测方法

doi:10.12133/j.smartag.SA202404002

Smart Agriculture ›› 2024, Vol. 6 ›› Issue (5): 139-152.doi: 10.12133/j.smartag.SA202404002

MSH-YOLOv8：融合尺度重建的蘑菇小目标检测方法

叶大鹏¹^,², 景均¹, 张之得¹^,², 李辉煌¹, 吴昊宇³, 谢立敏¹^,²()

^1. 福建农林大学机电工程学院，福建福州 350002，中国
^2. 福建省农业信息感知技术重点实验室，福建福州 350002，中国
^3. 福建农林大学未来技术学院，福建福州 350002，中国

收稿日期:2024-03-30 出版日期:2024-09-30
基金项目:
福建省林业科技项目(2023FKJ01)
作者简介:
叶大鹏，研究方向为农业生物环境监测与控制、山地农业机械性能设计与测试技术等。E-mail：ydp@fafu.edu.cn
通信作者:
谢立敏，博士，讲师，研究方向为非线性系统动力学、机器人运动控制。E-mail：lucy_min@163.com

MSH-YOLOv8: Mushroom Small Object Detection Method with Scale Reconstruction and Fusion

YE Dapeng¹^,², JING Jun¹, ZHANG Zhide¹^,², LI Huihuang¹, WU Haoyu³, XIE Limin¹^,²()

^1. College of Mechanical and Electrical Engineering, Fujian Agriculture and Forestry University, Fuzhou 350002, China
^2. Fujian Key Laboratory of Agricultural Information Sensoring Technology, Fuzhou 350002, China
^3. School of Future Technology, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China

Received:2024-03-30 Online:2024-09-30
Foundation items:Fujian Province Forestry Science and Technology Project(2023FKJ01)
About author:
YE Dapeng, E-mail: ydp@fafu.edu.cn
Corresponding author:
XIE Limin, E-mail: lucy_min@163.com

摘要/Abstract

摘要：

【目的/意义】 为了解决图像尺寸变化和目标尺度变换共存对小目标检测精度的影响问题，本研究提出了一种新的检测模型：Multi-Strategy Handling YOLOv8（MSH-YOLOv8）。 【方法】 该模型在YOLOv8的基础上增加一个检测头，以提高小尺度目标敏感度；引入Swin Transformer的检测结构到头部网络，以减少计算冗余；引入包含可变形卷积的C2f_Deformable Convolutionv4（C2f_DCNv4）结构和Swin Transformer编码器结构重构YOLOv8主干网络，优化并增强其特征传递和提取能力，提高小目标敏感度；采用基于规范化的注意力模块（Normalization-based Attention Module, NAM）优化网络检测速度和准确性；用Wise-Intersection over Union Loss（WIoU）代替原损失函数，以提高训练效果和收敛速度；在后处理阶段应用分辨率动态训练、多尺度测试、软非极大值抑制算法（Soft-Non-Maximum Suppression, Soft-NMS）、加权边界框融合算法（Weighted Boxes Fusion, WBF）等方法，提高尺度变化下小目标检测效果。以蘑菇为研究对象，在开放数据集Fungi上开展实验。 【结果和讨论】 MSH-YOLOv8的平均正确率（Average Precision50, AP50）和AP@50-95分别达到了98.49%和75.29%，其中小目标检测指标值APs达39.73%。相较于主流模型YOLOv8，三项指标分别提高了2.34%，4.06%和8.55%；相较于优秀模型Transformer Prediction Heads-YOLOv5（TPH-YOLOv5），三项指标分别提高了2.14%，2.76%和6.89%。 【结论】 本研究提出的MSH-YOLOv8改进方法可在图像尺寸变化与目标尺度变化条件下有效提高小目标的检测效果。

关键词: 图像尺寸, 小目标检测, 特征提取, 多尺度检测, 模型集成

Abstract:

[Objective] Traditional object detection algorithms applied in the agricultural field, such as those used for crop growth monitoring and harvesting, often suffer from insufficient accuracy. This is particularly problematic for small crops like mushrooms, where recognition and detection are more challenging. The introduction of small object detection technology promises to address these issues, potentially enhancing the precision, efficiency, and economic benefits of agricultural production management. However, achieving high accuracy in small object detection has remained a significant challenge, especially when dealing with varying image sizes and target scales. Although the YOLO series models excel in speed and large object detection, they still have shortcomings in small object detection. To address the issue of maintaining high accuracy amid changes in image size and target scale, a novel detection model, Multi-Strategy Handling YOLOv8 (MSH-YOLOv8), was proposed. [Methods] The proposed MSH-YOLOv8 model builds upon YOLOv8 by incorporating several key enhancements aimed at improving sensitivity to small-scale targets and overall detection performance. Firstly, an additional detection head was added to increase the model's sensitivity to small objects. To address computational redundancy and improve feature extraction, the Swin Transformer detection structure was introduced into the input module of the head network, creating what was termed the "Swin Head (SH)". Moreover, the model integrated the C2f_Deformable convolutionv4 (C2f_DCNv4) structure, which included deformable convolutions, and the Swin Transformer encoder structure, termed "Swinstage", to reconstruct the YOLOv8 backbone network. This optimization enhanced feature propagation and extraction capabilities, increasing the network's ability to handle targets with significant scale variations. Additionally, the normalization-based attention module (NAM) was employed to improve performance without compromising detection speed or computational complexity. To further enhance training efficacy and convergence speed, the original loss function CIoU was replaced with wise-intersection over union (WIoU) Loss. Furthermore, experiments were conducted using mushrooms as the research subject on the open Fungi dataset. Approximately 200 images with resolution sizes around 600×800 were selected as the main research material, along with 50 images each with resolution sizes around 200×400 and 1 000×1 200 to ensure representativeness and generalization of image sizes. During the data augmentation phase, a generative adversarial network (GAN) was utilized for resolution reconstruction of low-resolution images, thereby preserving semantic quality as much as possible. In the post-processing phase, dynamic resolution training, multi-scale testing, soft non-maximum suppression (Soft-NMS), and weighted boxes fusion (WBF) were applied to enhance the model's small object detection capabilities under varying scales. [Results and Discussions] The improved MSH-YOLOv8 achieved an average precision at 50% (AP50) intersection over union of 98.49% and an AP@50-95 of 75.29%, with the small object detection metric APs reaching 39.73%. Compared to mainstream models like YOLOv8, these metrics showed improvements of 2.34%, 4.06% and 8.55%, respectively. When compared to the advanced TPH-YOLOv5 model, the improvements were 2.14%, 2.76% and 6.89%, respectively. The ensemble model, MSH-YOLOv8-ensemble, showed even more significant improvements, with AP50 and APs reaching 99.14% and 40.59%, respectively, an increase of 4.06% and 8.55% over YOLOv8. These results indicate the robustness and enhanced performance of the MSH-YOLOv8 model, particularly in detecting small objects under varying conditions. Further application of this methodology on the Alibaba Cloud Tianchi databases "Tomato Detection" and "Apple Detection" yielded MSH-YOLOv8-t and MSH-YOLOv8-a models (collectively referred to as MSH-YOLOv8). Visual comparison of detection results demonstrated that MSH-YOLOv8 significantly improved the recognition of dense and blurry small-scale tomatoes and apples. This indicated that the MSH-YOLOv8 method possesses strong cross-dataset generalization capability and effectively recognizes small-scale targets. In addition to quantitative improvements, qualitative assessments showed that the MSH-YOLOv8 model could handle complex scenarios involving occlusions, varying lighting conditions, and different growth stages of the crops. This demonstrates the practical applicability of the model in real-world agricultural settings, where such challenges are common. [Conclusions] The MSH-YOLOv8 improvement method proposed in this study effectively enhances the detection accuracy of small mushroom targets under varying image sizes and target scales. This approach leverages multiple strategies to optimize both the architecture and the training process, resulting in a robust model capable of high-precision small object detection. The methodology's application to other datasets, such as those for tomato and apple detection, further underscores its generalizability and potential for broader use in agricultural monitoring and management tasks.

Key words: image size, small object detection, feature extraction, multi-scale detection, model ensemble

中图分类号:

S24
TP391

叶大鹏, 景均, 张之得, 李辉煌, 吴昊宇, 谢立敏. MSH-YOLOv8：融合尺度重建的蘑菇小目标检测方法[J]. 智慧农业(中英文), 2024, 6(5): 139-152.

YE Dapeng, JING Jun, ZHANG Zhide, LI Huihuang, WU Haoyu, XIE Limin. MSH-YOLOv8: Mushroom Small Object Detection Method with Scale Reconstruction and Fusion[J]. Smart Agriculture, 2024, 6(5): 139-152.

图/表 22

图1

图2

图3

图4

图5

图6

图7

图8

表1

图9

图10

表2

图11

表3

表4

表5

表6

图12

表7

图13

表8

图14

参考文献 28

1	LIU J Z, LI Z G, LI P P. History and present situations of robotic harvesting technology: A review[M]// Springer Tracts in Mechanical Engineering. Cham: Springer Singapore, 2021.
2	刘雨婷. 基于特征融合的小目标检测算法研究[D]. 徐州: 中国矿业大学, 2023.
	LIU Y T. Research on the small object detection algorithm based on feature fusion[D].Xuzhou: China University of Mining and Technology, 2023.
3	ZHU X K, LYU S C, WANG X, et al. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]// 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). Piscataway, New Jersey, USA: IEEE, 2021: 2778-2788.
4	PHAM M T, COURTRAI L, FRIGUET C, et al. YOLO-fine: One-stage detector of small objects under various backgrounds in remote sensing images[J]. Remote sensing, 2020, 12(15): ID 2501.
5	MATHEW M P, MAHESH T Y. Leaf-based disease detection in bell pepper plant using YOLO v5[J]. Signal, image and video processing, 2022, 16(3): 841-847.
6	GAI R L, CHEN N, YUAN H. A detection algorithm for cherry fruits based on the improved YOLO-v4 model[J]. Neural computing and applications, 2023, 35(19): 13895-13906.
7	JI S J, LING Q H, HAN F. An improved algorithm for small object detection based on YOLO v4 and multi-scale contextual information[J]. Computers and electrical engineering, 2023, 105: ID 108490.
8	LIU M, WANG X, ZHOU A, et al. UAV-YOLO: Small object detection on unmanned aerial vehicle perspective[J]. Sensors (basel), 2020, 20(8): ID E2238.
9	LI Y J, LI S S, DU H H, et al. YOLO-ACN: Focusing on small target and occluded object detection[J]. IEEE access, 2020, 8: 227288-227303.
10	ZHANG W Z, HAN Y L, HUANG C, et al. Recognition method for seed potato buds based on improved YOLOv3-tiny[J]. INMATEH agricultural engineering, 2022, 67(2): 364-373.
11	ZHANG J, MENG Y Z, YU X H, et al. MBAB-YOLO: A modified lightweight architecture for real-time small target detection[J]. IEEE access, 2023, 11: 78384-78401.
12	LI R H, SHEN Y. YOLOSR-IST: A deep learning method for small target detection in infrared remote sensing images based on super-resolution and YOLO[J]. Signal processing, 2023, 208: ID 108962.
13	LIU Q, FANG M, LI Y S, et al. Deep learning based research on quality classification of shiitake mushrooms[J]. LWT, 2022, 168: ID 113902.
14	MA H, MA H G, JI J T, et al. FES-YOLOv5s: A lightweight model for agaricus bisporus detection[J]. IEEE access, 2024, 12: 71219-71231.
15	LU C P, LIAW J J. A novel image measurement algorithm for common mushroom caps based on convolutional neural network[J]. Computers and electronics in agriculture, 2020, 171: ID 105336.
16	RETSINAS G, EFTHYMIOU N, MARAGOS P. Mushroom segmentation and 3D pose estimation from point clouds using fully convolutional geometric features and implicit pose encoding[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Piscataway, New Jersey, USA: IEEE, 2023: 6263-6270.
17	张银萍, 朱双杰, 徐燕, 等. 基于机器视觉的猴头菇品质快速无损检测与分级[J]. 现代食品科技, 2023, 39(3): 239-246.
	ZHANG Y P, ZHU S J, XU Y, et al. Rapid non-destructive testing and grading of hericium erinaceus based on machine vision[J]. Modern food science and technology, 2023, 39(3): 239-246.
18	YANG Y M, LIAO Y R, CHENG L F, et al. Remote sensing image aircraft target detection based on GIoU-YOLO v3[C]// 2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP). Piscataway, New Jersey, USA: IEEE, 2021: 474-478.
19	WANG C Y, MARK LIAO H Y, WU Y H, et al. CSPNet: A new backbone that can enhance learning capability of CNN[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Piscataway, New Jersey, USA: IEEE, 2020: 1571-1580.
20	LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows[C]// 2021 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway, New Jersey, USA: IEEE, 2021.
21	XIONG Y W, LI Z Q, CHEN Y T, et al. Efficient deformable ConvNets: Rethinking dynamic and sparse operator for vision applications[EB/OL]. arXiv: 2401.06197, 2024.
22	LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2017: 936-944.
23	HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]// European Conference on Computer Vision. Cham, Germany: Springer, 2014: 346-361.
24	LIU Y, SHAO Z, TENG Y, ET AL. NAM: Normalization-based attention module[EB/OL]. arXiv: 2111.12419, 2021.
25	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[M]// Computer Vision-ECCV 2018. Cham: Springer International Publishing, 2018: 3-19.
26	LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway, New Jersey, USA: IEEE, 2018: 8759-8768.
27	CHEN C Y, LIU M Y, TUZEL O, et al. R-CNN for small object detection[C]// Asian Conference on Computer Vision. Cham, Germany: Springer, 2017: 214-230.
28	HU D A, YU M, WU X Y, et al. DGW-YOLOv8: A small insulator target detection algorithm based on deformable attention backbone and WIoU loss function[J]. IET image processing, 2024, 18(4): 1096-1108.

指标	Basic	Meth A	Meth B	Meth C	Meth D	Meth E	Meth F
Xs Det	×	√	√	√	√	√	√
Swin Transformer encoder	×	×	√	√	√	√	√
C2f_DCN	×	×	×	√	√	√	√
NAM	×	×	×	×	√	√	√
SHs + New Anchors	×	×	×	×	×	√	√
WIoU	×	×	×	×	×	×	√
AP50/%	96.13	96.71 （0.61↑）	97.44 （0.75↑）	97.56 （0.12↑）	97.93 （0.38↑）	98.27 （0.35↑）	98.49 （0.22↑）
APs/%	36.69	37.15 （1.09↑）	38.20 （2.82↑）	38.36 （0.42↑）	38.62 （0.68↑）	39.59 （2.51↑）	39.73 （0.35↑）
Params/M	11.17	19.94	28.15	28.17	28.14	26.93	26.93
GFLOPs	28.81	34.91	40.57	40.67	40.55	24.63	24.63

模型	AP50/%
Meth E+ CIou Loss	97.83
Meth E+ GIou Loss	98.37
Meth E+ DIou Loss	98.09
Meth E+ WIou Loss	98.37

模型	AP50/%	APs/%	Params/M	GFLOPs
MSH-YOLO+ NAM	98.47	39.70	26.93	24.63
MSH-YOLO+ CBAM	98.45	39.66	27.02	29.58
MSH-YOLO+ SimAM	98.33	39.63	27.95	29.62
MSH-YOLO+ SE	98.39	39.61	27.03	29.58
MSH-YOLO+ ECA	98.41	39.64	26.95	29.63

模型名称	AP50/%	变化量/%	AP@50-95/%	变化量/%	APs/%	变化量/%	GFLOPs	Params/M
YOLOv5	95.81	2.80↑	71.13	5.85↑	36.32	9.39↑	16.66	7.23
YOLOv8	96.24	2.34↑	72.35	4.06↑	36.60	8.55↑	28.81	11.17
Vision Transformer	96.07	2.52↑	71.96	4.63↑	36.42	9.09↑	17.67	14.67
Swin Transformer	96.29	2.28↑	72.54	3.79↑	36.57	8.64↑	41.53	49.94
TPH-YOLOv5	96.43	2.14↑	73.27	2.76↑	37.17	6.89↑	36.50	41.91
MSH-YOLOv8	98.49	\	75.29	\	39.73	\	24.63	26.93

子模型	处理/输入尺寸	AP50/%	APs/%
MSH-YOLOv8-1（MSH1）	480	96.76	37.16
MSH-YOLOv8-2	512	97.90	38.82
MSH-YOLOv8-3	640	98.49	39.73
MSH-YOLOv8-4	768	98.13	39.37
MSH-YOLOv8-5	800	97.95	38.73
MSH-YOLOv8-6	960	97.71	38.65
MSH-YOLOv8-7	1 024	96.82	37.31

MSH-YOLOv8：融合尺度重建的蘑菇小目标检测方法

MSH-YOLOv8: Mushroom Small Object Detection Method with Scale Reconstruction and Fusion

在线阅读

知网下载

本地下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 22

参考文献 28

相关文章 5

编辑推荐

Metrics

本文评价

模型	输入尺寸							F ₁平均值/%
模型	480 F ₁/%	512 F ₁/%	640 F ₁/%	768 F ₁/%	800 F ₁/%	960 F ₁/%	1 024 F ₁/%	F ₁平均值/%
MSH1 A	98.22	95.76	95.63	93.74	96.22	92.01	94.73	96.59
MSH2 AAAA	96.75	98.52	93.25	95.88	87.05	92.16	90.77	97.12
MSH3 AAAAA	95.88	96.92	98.49	97.32	96.94	98.33	97.84	97.45
MSH4 AAA	96.73	95.50	96.77	98.78	97.97	96.27	95.42	97.08
MSH5 AAAA	95.87	96.76	97.26	97.04	98.25	96.35	94.09	97.14
MSH6 AAA	94.83	96.37	96.93	96.65	97.09	98.22	96.94	97.07
MSH7 AA	96.08	95.13	97.28	97.92	96.86	97.11	98.31	96.90

模型	AP50/%	AP50（Soft-NMS） /%	APs/%	APs（Soft-NMS） /%
MSH1	96.76	96.94（0.19↑）	37.16	37.68（1.40↑）
MSH2	97.90	98.26（0.37↑）	38.82	39.36（1.39↑）
MSH3	98.49	98.63（0.14↑）	39.73	40.23（1.26↑）
MSH4	98.13	98.48（0.36↑）	39.37	39.82（1.14↑）
MSH5	97.95	98.05（0.10↑）	38.73	39.25（1.34↑）
MSH6	97.71	97.92（0.21↑）	38.65	39.16（1.32↑）
MSH7	96.82	97.38（0.58↑）	37.31	37.81（1.34↑）
平均值	97.63	97.95（0.33↑）	38.54	39.05（1.31↑）
Ensemble（WBF）	98.85	99.14	40.13	40.59

[1]	宫宇, 王玲, 赵荣强, 尤海波, 周沫, 刘劼. 基于多模态数据表型特征提取的番茄生长高度预测方法[J]. 智慧农业(中英文), 2025, 7(1): 97-110.
[2]	马巍巍, 陈悦, 王咏梅. 基于深度网络集成的复杂背景甘蔗叶片病害识别[J]. 智慧农业(中英文), 2025, 7(1): 136-145.
[3]	纪楠, 尹艳玲, 沈维政, 寇胜利, 戴百生, 王国维. 叫声在生猪福利监测中的研究进展与挑战[J]. 智慧农业(中英文), 2022, 4(2): 19-35.
[4]	付元元, 杨贵军, 段丹丹, 张永涛, 顾晓鹤, 杨小冬, 徐新刚, 李振海. AVIRIS高光谱数据空-谱特征在植被分类中的对比分析[J]. 智慧农业(中英文), 2020, 2(1): 68-76.
[5]	李淼, 王敬贤, 李华龙, 胡泽林, 杨选将, 黄小平, 曾伟辉, 张建, 房思思. 基于CNN和迁移学习的农作物病害识别方法研究[J]. 智慧农业(中英文), 2019, 1(3): 46-55.

模型	Basic	AE1	AE2	AE3	AE4	AE5	AE6
Xs Det	×	√	×	×	×	×	×
C2f_DCN	×	×	√	×	×	×	×
Swin Transformer encoder	×	×	×	√	×	×	×
NAM	×	×	×	×	√	×	×
WIoU	×	×	×	×	×	√	×
SHs + New Anchors	×	×	×	×	×	×	√
APs/%	36.69	37.15 （1.25↑）	37.03 （0.93↑）	37.18 （1.34↑）	37.05 （0.98↑）	36.71 （0.06↑）	37.13
APs/%	36.69	37.15 （1.25↑）	37.03 （0.93↑）	37.18 （1.34↑）	37.05 （0.98↑）	36.71 （0.06↑）	（1.19↑）