基于改进RT-DETR的菌棒栽培香菇检测方法

doi:10.12133/j.smartag.SA202506034

Smart Agriculture ›› 2025, Vol. 7 ›› Issue (5): 67-77.doi: 10.12133/j.smartag.SA202506034

• 专刊--光智农业创新技术与应用 • 上一篇下一篇

基于改进RT-DETR的菌棒栽培香菇检测方法

王风云¹, 王轩宇², 安磊³, 封文杰¹()

^1. 山东省农业科学院，山东济南 250100，中国
^2. 齐鲁工业大学（山东省科学院）计算机科学与技术学部，山东济南 250300，中国
^3. 东营市河口区行政审批服务局，山东东营 257200，中国

收稿日期:2025-06-29 出版日期:2025-09-30
基金项目:
山东省自然科学基金面上项目(ZR2022MC067); 国家重点研发计划(2021YFB3901303); 山东省重点研发计划（重大科技创新工程）(2022CXGC010610); 山东省农业科学院农业科技创新工程(CXGC2024A08)
作者简介:
王风云，硕士，研究员，研究方向为智慧农业。E-mail： wfylily@163.com
通信作者:
封文杰，研究员，研究方向为农业信息化。E-mail： 34941269@qq.com

Detection Method for Log-Cultivated Shiitake Mushrooms Based on Improved RT-DETR

WANG Fengyun¹, WANG Xuanyu², AN Lei³, FENG Wenjie¹()

^1. Shandong Academy of Agricultural Sciences, Jinan 250100, China
^2. Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250300, China
^3. Dongying Hekou District Administrative Examination and Approval Service Bureau, Dongying 257200, China

Received:2025-06-29 Online:2025-09-30
Foundation items:Natural Science Foundation of Shandong Province(ZR2022MC067); National Key Research and Development Program of China(2021YFB3901303); Key Technology Research and Development Program of Shandong Province(2022CXGC010610); Agricultural Scientific and Technological Innovation Project of Shandong Academy of Agricultural Sciences(CXGC2024A08)
About author:
WANG Fengyun, E-mail: wfylily@163.com
Corresponding author:
FENG Wenjie, E-mail: 34941269@qq.com

摘要/Abstract

摘要：

［目的/意义］ 随着计算机视觉与自动化技术在香菇工厂化生产中的深入应用，拌料、装袋、灭菌、接种等环节已基本实现自动化，而采摘与分级仍高度依赖人工，成为制约产业效率的关键环节。为提升香菇采收阶段的智能化水平，亟需构建高精度、轻量化的目标检测模型。 ［方法］ 提出了一种基于改进RT-DETR（Real-Time DEtection TRansformer）的香菇采收评价模型——FSE-DETR。该模型在主干网络中引入FasterNet Block以降低计算复杂度，并在特征编码阶段设计了小目标特征融合网络（Small Object Feature Fusion Network, SFFN），通过空间到深度卷积（Space-to-depth Conv, SPDConv）保留细粒度空间信息，结合跨阶段全核模块（Cross Stage Partial Omni-Kernel Module, CSPOmniKernel）实现多尺度特征提取与全局上下文建模；同时采用高效交并比（Efficient IoU, EIoU）损失函数优化边界框定位精度与收敛速度。 ［结果和讨论］ FSE-DETR在检测精度和模型效率方面均优于Faster R-CNN（Faster Region-based Convolutional Neural Network）、YOLOv7、YOLOv8m和YOLOv12m等主流模型，在小目标、密集遮挡和低光照条件下表现更加稳定。模型最终准确率达95.8%，召回率为93.1%，平均精度均值为95.3%，同时具备良好的计算效率，参数量为19.1 M，FLOPs为53.6 G，展现出优异的实用性与部署潜力。 ［结论］ FSE-DETR在保持高检测精度的同时实现了轻量化与高效率，能够为香菇工厂化生产中的采收评价提供可靠的技术支持。

关键词: 香菇, 采收评价, FSE-DETR, 深度学习, 目标检测, YOLOv12m

Abstract:

[Objective] Shiitake mushroom is one of the most important edible and medicinal fungi in China, and its factory-based cultivation has become a major production model. Although mixing, bagging, sterilization, and inoculation have been largely automated, harvesting and grading still depend heavily on manual labor, which leads to high labor intensity, low efficiency, and inconsistency caused by subjective judgment, thereby restricting large-scale production. Furthermore, the clustered growth pattern of shiitake mushrooms, the high proportion of small targets, severe occlusion, and complex illumination conditions present additional challenges to automated detection. Traditional object detection models often struggle to balance accuracy, robustness, and lightweight efficiency in such environments. Therefore, there is an urgent need for a high-precision and lightweight detection model capable of supporting intelligent evaluation in mushroom harvesting. [Methods] To address these challenges, this study proposed an improved real-time detection model named FSE-DETR, based on the RT-DETR framework. In the backbone, the FasterNet Block was introduced to replace the original HGNetv2 structure. By combining partial convolution (PConv) for efficient channel reduction and pointwise convolution (PWConv) for rapid feature integration, the FasterNet Block reduced redundant computation and parameter size while maintaining effective multi-scale feature extraction, thereby improving both efficiency and deployment feasibility. In the encoder, a small object feature fusion network (SFFN) was designed to enhance the recognition of immature mushrooms and other small targets. This network first applied space-to-depth convolution (SPDConv), which rearranged spatial information into channel dimensions without discarding fine-grained details such as edges and textures. The processed features were then passed through the cross stage partial omni-kernel (CSPOmniKernel) module, which divided feature maps into two parts: one path preserved original information, while the other path underwent multi-scale convolutional operations including 1×1, asymmetric large-kernel, and frequency-domain transformations, before being recombined. This design enabled the model to capture both local structural cues and global semantic context simultaneously, improving its robustness under occlusion and scale variation. For bounding box regression, the Efficient Intersection over Union (EIoU) loss function was adopted to replace generalized IoU (GIoU). Unlike GIoU, EIoU explicitly penalized differences in center distance, aspect ratio, and scale between predicted and ground-truth boxes, resulting in more precise localization and faster convergence during training. The dataset was constructed from images collected in mushroom cultivation facilities using fixed-position RGB cameras under diverse illumination conditions, including direct daylight, low-light, and artificial lighting, to ensure realistic coverage. Four mushroom categories were annotated: immature mushrooms, flower mushrooms, smooth cap mushrooms, and defective mushrooms, following industrial grading standards. To address the limited size of raw data and prevent overfitting, extensive augmentation strategies such as horizontal and vertical flipping, random rotation, Gaussian and salt-and-pepper noise addition, and synthetic occlusion were applied. The augmented dataset consisted of 4 000 images, which were randomly divided into training, validation, and test sets at a ratio of 7:2:1, ensuring balanced distribution across all categories. [Results and Discussions] Experimental evaluation was conducted under consistent hardware and hyperparameter settings. The ablation study revealed that FasterNet effectively reduced parameters and computation while slightly improving accuracy, SFFN significantly enhanced the detection of small and occluded mushrooms, and EIoU improved bounding box regression. When integrated, these improvements enabled the final model to achieve an accuracy of 95.8%, a recall of 93.1%, and a mAP50 of 95.3%, with a model size of 19.1 M and a computational cost of 53.6 GFLOPs, thus achieving a favorable balance between precision and efficiency. Compared with mainstream detection models including Faster R-CNN, YOLOv7, YOLOv8m, and YOLOv12m, FSE-DETR consistently outperformed them in terms of accuracy, robustness, and model efficiency. Notably, the mAP for immature and defective mushrooms increased by 2.4 and 2.5 percentage points, respectively, compared with the baseline RT-DETR, demonstrating the effectiveness of the SFFN module for small-object detection. Visualization analysis further confirmed that FSE-DETR maintained stable detection performance under different illumination and occlusion conditions, effectively reducing missed detections, false positives, and repeated recognition, while other models exhibited noticeable deficiencies. These results verified the superior robustness and reliability of the proposed model in practical mushroom factory environments. [Conclusions] The proposed FSE-DETR model integrated the FasterNet Block, Small Object Feature Fusion Network, and EIoU loss into the RT-DETR framework, achieving state-of-the-art accuracy while maintaining lightweight characteristics. The model showed strong adaptability to small targets, occlusion, and complex illumination, making it a reliable solution for intelligent mushroom harvest evaluation. With its balance of precision and efficiency, FSE-DETR demonstrates great potential for deployment in real-world factory production and provides a valuable reference for developing high-performance, lightweight detection models for other agricultural applications.

Key words: shiitake mushroom, harvest evaluation, FSE-DETR, deep learning, object detection, YOLOv12m

中图分类号:

TP391
S126

王风云, 王轩宇, 安磊, 封文杰. 基于改进RT-DETR的菌棒栽培香菇检测方法[J]. 智慧农业(中英文), 2025, 7(5): 67-77.

WANG Fengyun, WANG Xuanyu, AN Lei, FENG Wenjie. Detection Method for Log-Cultivated Shiitake Mushrooms Based on Improved RT-DETR[J]. Smart Agriculture, 2025, 7(5): 67-77.

图/表 13

图1

图2

表1

图3

图4

图5

图6

图7

图8

表2

表3

表4

图9

参考文献 29

[1]	张俊飚, 彭子怡, 颜廷武. 我国香菇产业国际贸易发展的现状、问题与对策[J]. 食药用菌, 2022, 30(3): 165-171.
	ZHANG J B, PENG Z Y, YAN T W. Present situation, problems and solutions of international trade development of Lentinula edodes industry in China[J]. Edible and medicinal mushrooms, 2022, 30(3): 165-171.
[2]	曹斌, 张月吟, 高博. 全球香菇产业发展历史、现状及趋势[J]. 食用菌学报, 2024, 31(3): 1-20.
	CAO B, ZHANG Y Y, GAO B. Development history, current situation and trends of global Lentinula edodes industry[J]. Acta edulis fungi, 2024, 31(3): 1-20.
[3]	LIN A, LIU Y F, ZHANG L. Mushroom detection and positioning method based on neural network[C]// 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC). Piscataway, New Jersey, USA: IEEE, 2021: 1174-1178.
[4]	AHMAD I, ARIF M, XU M M, et al. Therapeutic values and nutraceutical properties of shiitake mushroom (Lentinula edodes): A review[J]. Trends in food science & technology, 2023, 134: 123-135.
[5]	王磊磊, 王斌, 李东晓, 等. 基于改进YOLOv5的菇房平菇目标检测与分类研究[J]. 农业工程学报, 2023, 39(17): 163-171.
	WANG L L, WANG B, LI D X, et al. Object detection and classification of Pleurotus ostreatus using improved YOLOv5[J]. Transactions of the Chinese society of agricultural engineering, 2023, 39(17): 163-171.
[6]	赵明岩, 吴顺海, 李一欣, 等. 基于改进YOLOv5s的黑皮鸡枞菌检测方法[J]. 农业工程学报, 2023, 39(12): 265-274.
	ZHAO M Y, WU S H, LI Y X, et al. Improved YOLOv5s-based detection method for Termitomyces albuminosus[J]. Transactions of the Chinese society of agricultural engineering, 2023, 39(12): 265-274.
[7]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(6): 1137-1149.
[8]	HE K M, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[C]// 2017 IEEE International Conference on Computer Vision (ICCV). Piscataway, New Jersey, USA: IEEE, 2017: 2980-2988.
[9]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot MultiBox detector[C]// Computer Vision – ECCV 2016. Cham, Germany: Springer, 2016: 21-37.
[10]	KHANAM R, HUSSAIN M. What is YOLOv5: A deep look into the internal features of the popular object detector[EB/OL]. arXiv: 2407.20892, 2024.
[11]	WANG C Y, BOCHKOVSKIY A, LIAO H M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2023: 7464-7475.
[12]	YASEEN M. What is YOLOv8: An in-depth exploration of the internal features of the next-generation object detector[EB/OL]. arXiv: 2408.15857, 2024.
[13]	Satokawa K, Uwate Y, Nishio Y. Classification of shiitake mushrooms by using convolutional neural networks with edge detection images[C]// IEEE Workshop on Nonlinear Circuit Networks. Piscataway, New Jersey, USA: IEEE, 2021: 52-55.
[14]	叶大鹏, 景均, 张之得, 等. MSH-YOLOv8: 融合尺度重建的蘑菇小目标检测方法[J]. 智慧农业(中英文), 2024, 6(5): 139-152.
	YE D P, JING J, ZHANG Z D, et al. MSH-YOLOv8: Mushroom small object detection method with scale reconstruction and fusion[J]. Smart agriculture, 2024, 6(5): 139-152.
[15]	LIU Q, FANG M, LI Y S, et al. Deep learning based research on quality classification of shiitake mushrooms[J]. LWT, 2022, 168: ID 113902.
[16]	DENG J W, LIU Y H, XIAO X Q. Deep-learning-based wireless visual sensor system for shiitake mushroom sorting[J]. Sensors, 2022, 22(12): ID 4606.
[17]	AMIRUDDIN K, ABDUL KAHAR N H, AHMAD I, et al. Automated mushroom classification system using machine learning[J]. Journal of advanced research in applied sciences and engineering technology, 2024: 129-140.
[18]	WANG J L, SONG W D, ZHENG W G, et al. Spatial-channel transformer network based on mask-RCNN for efficient mushroom instance segmentation[J]. International journal of agricultural and biological engineering, 2024, 17(4): 227-235.
[19]	Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need[C]// Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017). New York, USA: Curran Associates, Inc., 2017: 6000-6010.
[20]	TURNER R E. An introduction to transformers[EB/OL]. arXiv:2304.10557, 2023.
[21]	CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[M]// Computer Vision – ECCV 2020. Cham: Springer International Publishing, 2020: 213-229.
[22]	ZHAO Y A, LV W Y, XU S L, et al. DETRs beat YOLOs on real-time object detection[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2024: 16965-16974.
[23]	胡继文, 张国梁, 沈明哲, 等. 面向松木表面缺陷检测的改进RT-DETR模型[J]. 农业工程学报, 2024, 40(7): 210-218.
	HU J W, ZHANG G L, SHEN M Z, et al. Detecting surface defects of pine wood using an improved RT-DETR model[J]. Transactions of the Chinese society of agricultural engineering, 2024, 40(7): 210-218.
[24]	CHEN J R, KAO S H, HE H, et al. Run, don't walk: Chasing higher FLOPS for faster neural networks[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2023: 12021-12031.
[25]	SUNKARA R, LUO T. No more strided convolutions or pooling: A new CNN building block for low-resolution images and small objects[M]// Machine Learning and Knowledge Discovery in Databases. Cham: Springer Nature Switzerland, 2023: 443-459.
[26]	CUI Y N, REN W Q, KNOLL A. Omni-kernel network for image restoration[J]. Proceedings of the AAAI conference on artificial intelligence, 2024, 38(2): 1426-1434.
[27]	WANG C Y, MARK LIAO H Y, WU Y H, et al. CSPNet: A new backbone that can enhance learning capability of CNN[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Piscataway, New Jersey, USA: IEEE, 2020: 1571-1580.
[28]	REZATOFIGHI H, TSOI N, GWAK J, et al. Generalized intersection over union: A metric and a loss for bounding box regression[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2019: 658-666.
[29]	ZHANG Y F, REN W Q, ZHANG Z, et al. Focal and efficient IoU loss for accurate bounding box regression[J]. Neurocomputing, 2022, 506: 146-157.

种类	标签名称	菌盖直径	菌盖特征	是否采摘
花香菇	flower	≥4 cm	呈圆形或椭圆形状，且菌盖开裂形成花状或网状图案	是
光帽香菇	smooth cap	≥4 cm	呈圆形或椭圆形状，且表面光滑没有裂纹	是
残缺香菇	defective	≥4 cm	呈不规则形状，或菌盖表面凹陷，或菌盖破损	是
未成熟香菇	immature	<4 cm	不予判断	否

试验	FasterNet	SFFN	EIoU	准确率/%	召回率/%	mAP50/%	模型参数量/M	浮点运算次数FLOPs/ G
1				92.8	90.8	93.3	19.8	57.0
2	√			92.6	91.2	93.9	15.5	51.3
3		√		93.8	92.2	94.3	23.4	65.2
4			√	93.1	91.1	93.7	19.8	57.0
5	√	√		94.3	92.6	94.2	19.1	57.7
6	√		√	93.7	91.7	93.5	15.5	51.3
7		√	√	94.5	92.3	94.6	23.4	65.2
8	√	√	√	95.8	93.1	95.3	19.1	53.6

类别	RT-DETR	RT-DETR + SFFN	FSE-DETR
光帽香菇	94.6	95.4	95.5
未成熟香菇	92.7	94.6	95.1
花香菇	93.5	94.1	95.7
残缺香菇	92.4	93.3	94.9

模型	mAP50/%	模型参数量/M	浮点运算次数FLOPs/ G
Faster R-CNN	85.4	41.56	212.9
YOLOv7	92.8	37.21	105.3
YOLOv8m	92.3	25.90	79.3
YOLOv12m	93.7	20.11	68.5
FSE-DETR	95.3	19.10	53.6

基于改进RT-DETR的菌棒栽培香菇检测方法

Detection Method for Log-Cultivated Shiitake Mushrooms Based on Improved RT-DETR

在线阅读

知网下载

本地下载

可视化

摘要/Abstract

引用本文

使用本文

图/表 13

参考文献 29

相关文章 15

编辑推荐

Metrics

本文评价

[1]	李汶政, 杨信廷, 孙传恒, 崔腾鹏, 王慧, 李珊珊, 李文勇. 联合空间深度转换卷积与多尺度注意力机制的灯诱稻飞虱害虫检测方法[J]. 智慧农业(中英文), 2025, 7(5): 169-181.
[2]	韩文凯, 李涛, 冯青春, 陈立平. 面向复杂果园环境的改进YOLOv11n苹果轻量化实例分割算法[J]. 智慧农业(中英文), 2025, 7(5): 114-123.
[3]	罗学论, GOUDA Mostafa, 宋馨蓓, 胡妍, 张文凯, 何勇, 张瑾, 李晓丽. 基于YOLO与扩散模型的冠层环境灰茶尺蠖幼虫检测方法[J]. 智慧农业(中英文), 2025, 7(5): 156-168.
[4]	赵莹萍, 梁锦名, 陈贝章, 邓小玲, 张奕, 熊征, 潘明, 孟祥宝. 多智能体大模型在农业中的应用研究与展望[J]. 智慧农业(中英文), 2025, 7(5): 37-51.
[5]	刘祎恒, 刘立波. 基于改进YOLOv12的遮挡环境下肉牛目标检测方法[J]. 智慧农业(中英文), 2025, 7(5): 182-192.
[6]	胡妍, 王玉洁, 张雪晨, 张熠强, 于桦昊, 宋馨蓓, 叶思潭, 周继红, 陈振林, 纵巍伟, 何勇, 李晓丽. 基于高光谱成像技术的茯砖茶发花品质无损检测与智能识别方法[J]. 智慧农业(中英文), 2025, 7(4): 71-83.
[7]	杨启良, 禹璐, 梁嘉平. 基于改进YOLOv11的采后芦笋分级检测方法[J]. 智慧农业(中英文), 2025, 7(4): 84-94.
[8]	常戬, 王冰冰, 尹龙, 李燕青, 李兆歆, 李壮. 基于YOLOv10n-CHL的蜜蜂授粉轻量化识别模型[J]. 智慧农业(中英文), 2025, 7(3): 185-198.
[9]	李瑞杰, 王爱冬, 吴华星, 李子秋, 冯向前, 洪卫源, 汤学军, 覃金华, 王丹英, 褚光, 张运波, 陈松. 水稻生育期遥感监测的研究进展、瓶颈问题与技术优化路径[J]. 智慧农业(中英文), 2025, 7(3): 89-107.
[10]	韩宇, 齐康康, 郑纪业, 李金瑷, 姜富贵, 张相伦, 游伟, 张霞. 基于改进YOLOv11的轻量化肉牛面部识别方法[J]. 智慧农业(中英文), 2025, 7(3): 173-184.
[11]	谢纪元, 张东彦, 牛圳, 程涛, 苑峰, 刘亚玲. 基于YOLOv10-MHSA的“三北”工程内蒙古地区植树位点精准检测研究[J]. 智慧农业(中英文), 2025, 7(3): 108-119.
[12]	马六, 毛克彪, 郭中华. 基于混合注意力生成对抗网络的遥感图像去雾方法[J]. 智慧农业(中英文), 2025, 7(2): 172-182.
[13]	许世卫, 李乾川, 栾汝朋, 庄家煜, 刘佳佳, 熊露. 农产品市场监测预警深度学习智能预测方法[J]. 智慧农业(中英文), 2025, 7(1): 57-69.
[14]	宫宇, 王玲, 赵荣强, 尤海波, 周沫, 刘劼. 基于多模态数据表型特征提取的番茄生长高度预测方法[J]. 智慧农业(中英文), 2025, 7(1): 97-110.
[15]	齐梓均, 牛当当, 吴华瑞, 张礼麟, 王仑峰, 张宏鸣. 基于双维信息与剪枝的中文猕猴桃文本命名实体识别方法[J]. 智慧农业(中英文), 2025, 7(1): 44-56.