Detection Method for Log-Cultivated Shiitake Mushrooms Based on Improved RT-DETR

doi:10.12133/j.smartag.SA202506034

Abstract

Abstract:

[Objective] Shiitake mushroom is one of the most important edible and medicinal fungi in China, and its factory-based cultivation has become a major production model. Although mixing, bagging, sterilization, and inoculation have been largely automated, harvesting and grading still depend heavily on manual labor, which leads to high labor intensity, low efficiency, and inconsistency caused by subjective judgment, thereby restricting large-scale production. Furthermore, the clustered growth pattern of shiitake mushrooms, the high proportion of small targets, severe occlusion, and complex illumination conditions present additional challenges to automated detection. Traditional object detection models often struggle to balance accuracy, robustness, and lightweight efficiency in such environments. Therefore, there is an urgent need for a high-precision and lightweight detection model capable of supporting intelligent evaluation in mushroom harvesting. [Methods] To address these challenges, this study proposed an improved real-time detection model named FSE-DETR, based on the RT-DETR framework. In the backbone, the FasterNet Block was introduced to replace the original HGNetv2 structure. By combining partial convolution (PConv) for efficient channel reduction and pointwise convolution (PWConv) for rapid feature integration, the FasterNet Block reduced redundant computation and parameter size while maintaining effective multi-scale feature extraction, thereby improving both efficiency and deployment feasibility. In the encoder, a small object feature fusion network (SFFN) was designed to enhance the recognition of immature mushrooms and other small targets. This network first applied space-to-depth convolution (SPDConv), which rearranged spatial information into channel dimensions without discarding fine-grained details such as edges and textures. The processed features were then passed through the cross stage partial omni-kernel (CSPOmniKernel) module, which divided feature maps into two parts: one path preserved original information, while the other path underwent multi-scale convolutional operations including 1×1, asymmetric large-kernel, and frequency-domain transformations, before being recombined. This design enabled the model to capture both local structural cues and global semantic context simultaneously, improving its robustness under occlusion and scale variation. For bounding box regression, the Efficient Intersection over Union (EIoU) loss function was adopted to replace GIoU. Unlike GIoU, EIoU explicitly penalized differences in center distance, aspect ratio, and scale between predicted and ground-truth boxes, resulting in more precise localization and faster convergence during training. The dataset was constructed from images collected in mushroom cultivation facilities using fixed-position RGB cameras under diverse illumination conditions, including direct daylight, low-light, and artificial lighting, to ensure realistic coverage. Four mushroom categories were annotated: immature mushrooms, flower mushrooms, smooth cap mushrooms, and defective mushrooms, following industrial grading standards. To address the limited size of raw data and prevent overfitting, extensive augmentation strategies such as horizontal and vertical flipping, random rotation, Gaussian and salt-and-pepper noise addition, and synthetic occlusion were applied. The augmented dataset consisted of 4 000 images, which were randomly divided into training, validation, and test sets at a ratio of 7:2:1, ensuring balanced distribution across all categories. [Results and Discussions] Experimental evaluation was conducted under consistent hardware and hyperparameter settings. The ablation study revealed that FasterNet effectively reduced parameters and computation while slightly improving accuracy, SFFN significantly enhanced the detection of small and occluded mushrooms, and EIoU improved bounding box regression. When integrated, these improvements enabled the final model to achieve an accuracy of 95.8%, a recall of 93.1%, and a mAP50 of 95.3%, with a model size of 19.1 M and a computational cost of 53.6 GFLOPs, thus achieving a favorable balance between precision and efficiency. Compared with mainstream detection models including Faster R-CNN, YOLOv7, YOLOv8m, and YOLOv12m, FSE-DETR consistently outperformed them in terms of accuracy, robustness, and model efficiency. Notably, the mAP for immature and defective mushrooms increased by 2.4 and 2.5 percent point, respectively, compared with the baseline RT-DETR, demonstrating the effectiveness of the SFFN module for small-object detection. Visualization analysis further confirmed that FSE-DETR maintained stable detection performance under different illumination and occlusion conditions, effectively reducing missed detections, false positives, and repeated recognition, while other models exhibited noticeable deficiencies. These results verified the superior robustness and reliability of the proposed model in practical mushroom factory environments. [Conclusions] In summary, the proposed FSE-DETR model integrated the FasterNet Block, Small Object Feature Fusion Network, and EIoU loss into the RT-DETR framework, achieving state-of-the-art accuracy while maintaining lightweight characteristics. The model showed strong adaptability to small targets, occlusion, and complex illumination, making it a reliable solution for intelligent mushroom harvest evaluation. With its balance of precision and efficiency, FSE-DETR demonstrates great potential for deployment in real-world factory production and provides a valuable reference for developing high-performance, lightweight detection models for other agricultural applications.

Key words: shiitake mushroom, harvest evaluation, FSE-DETR, deep learning, object detection

CLC Number:

TP391
S126

WANG Fengyun, WANG Xuanyu, AN Lei, FENG Wenjie. Detection Method for Log-Cultivated Shiitake Mushrooms Based on Improved RT-DETR[J]. Smart Agriculture, doi: 10.12133/j.smartag.SA202506034.

Figures/Tables 13

Fig.1

Fig.2

Tab.1

Fig. 3

Fig.4

Fig.5

Fig.6

Fig.7

Fig.8

Tab.2

Tab.3

Tab.4

Fig.9

References 29

[1]	张俊飚, 彭子怡, 颜廷武. 我国香菇产业国际贸易发展的现状、问题与对策[J]. 食药用菌, 2022, 30(3): 165-171.
	ZHANG J B, PENG Z Y, YAN T W. Present situation, problems and solutions of international trade development of Lentinula edodes industry in China[J]. Edible and medicinal mushrooms, 2022, 30(3): 165-171.
[2]	曹斌, 张月吟, 高博. 全球香菇产业发展历史、现状及趋势[J]. 食用菌学报, 2024, 31(3): 1-20.
	CAO B, ZHANG Y Y, GAO B. Development history, current situation and trends of global Lentinula edodes industry[J]. Acta edulis fungi, 2024, 31(3): 1-20.
[3]	LIN A, LIU Y F, ZHANG L. Mushroom detection and positioning method based on neural network[C]// 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC). Piscataway, New Jersey, USA: IEEE, 2021: 1174-1178.
[4]	AHMAD I, ARIF M, XU M M, et al. Therapeutic values and nutraceutical properties of shiitake mushroom (Lentinula edodes): A review[J]. Trends in food science & technology, 2023, 134: 123-135.
[5]	王磊磊, 王斌, 李东晓, 等. 基于改进YOLOv5的菇房平菇目标检测与分类研究[J]. 农业工程学报, 2023, 39(17): 163-171.
	WANG L L, WANG B, LI D X, et al. Object detection and classification of Pleurotus ostreatus using improved YOLOv5[J]. Transactions of the Chinese society of agricultural engineering, 2023, 39(17): 163-171.
[6]	赵明岩, 吴顺海, 李一欣, 等. 基于改进YOLOv5s的黑皮鸡枞菌检测方法[J]. 农业工程学报, 2023, 39(12): 265-274.
	ZHAO M Y, WU S H, LI Y X, et al. Improved YOLOv5s-based detection method for Termitomyces albuminosus[J]. Transactions of the Chinese society of agricultural engineering, 2023, 39(12): 265-274.
[7]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(6): 1137-1149.
[8]	HE K M, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[C]// 2017 IEEE International Conference on Computer Vision (ICCV). Piscataway, New Jersey, USA: IEEE, 2017: 2980-2988.
[9]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot MultiBox detector[C]// Computer Vision – ECCV 2016. Cham, Germany: Springer, 2016: 21-37.
[10]	KHANAM R, HUSSAIN M. What is YOLOv5: a deep look into the internal features of the popular object detector[EB/OL]. arXiv: 2407.20892, 2024.
[11]	WANG C Y, BOCHKOVSKIY A, LIAO H M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2023: 7464-7475.
[12]	YASEEN M. What is YOLOv8: An in-depth exploration of the internal features of the next-generation object detector[EB/OL]. arXiv: 2408.15857, 2024.
[13]	Satokawa K, Uwate Y, Nishio Y. Classification of shiitake mushrooms by using convolutional neural networks with edge detection images[C]// IEEE Workshop on Nonlinear Circuit Networks. Piscataway, New Jersey, USA: IEEE, 2021: 52–55.
[14]	YE D P, JING J, ZHANG Z D, et al. MSH-YOLOv8: Mushroom small object detection method with scale reconstruction and fusion[J]. Smart agriculture, 2024, 6(5).
[15]	LIU Q, FANG M, LI Y S, et al. Deep learning based research on quality classification of shiitake mushrooms[J]. LWT, 2022, 168: ID 113902.
[16]	DENG J W, LIU Y H, XIAO X Q. Deep-learning-based wireless visual sensor system for shiitake mushroom sorting[J]. Sensors, 2022, 22(12): ID 4606.
[17]	AMIRUDDIN K, ABDUL KAHAR N H, AHMAD I, et al. Automated mushroom classification system using machine learning[J]. Journal of advanced research in applied sciences and engineering technology, 2024: 129-140.
[18]	WANG J L, SONG W D, ZHENG W G, et al. Spatial-channel transformer network based on mask-RCNN for efficient mushroom instance segmentation[J]. International journal of agricultural and biological engineering, 2024, 17(4): 227-235.
[19]	Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need[C]// Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017). NY, USA: Curran Associates, Inc., 2017: 6000–6010.
[20]	TURNER R E. An introduction to transformers[EB/OL]. arXiv:2304.10557, 2023.
[21]	CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[M]// Computer Vision – ECCV 2020. Cham: Springer International Publishing, 2020: 213-229.
[22]	ZHAO Y A, LV W Y, XU S L, et al. DETRs beat YOLOs on real-time object detection[C]// 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2024: 16965-16974.
[23]	胡继文, 张国梁, 沈明哲, 等. 面向松木表面缺陷检测的改进RT-DETR模型[J]. 农业工程学报, 2024, 40(7): 210-218.
	HU J W, ZHANG G L, SHEN M Z, et al. Detecting surface defects of pine wood using an improved RT-DETR model[J]. Transactions of the Chinese society of agricultural engineering, 2024, 40(7): 210-218.
[24]	CHEN J R, KAO S H, HE H, et al. Run, don't walk: Chasing higher FLOPS for faster neural networks[C]// 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2023: 12021-12031.
[25]	SUNKARA R, LUO T. No more strided convolutions or pooling: A new CNN building block for low-resolution images and small objects[M]// Machine Learning and Knowledge Discovery in Databases. Cham: Springer Nature Switzerland, 2023: 443-459.
[26]	CUI Y N, REN W Q, KNOLL A. Omni-kernel network for image restoration[J]. Proceedings of the AAAI conference on artificial intelligence, 2024, 38(2): 1426-1434.
[27]	WANG C Y, MARK LIAO H Y, WU Y H, et al. CSPNet: A new backbone that can enhance learning capability of CNN[C]// 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Piscataway, New Jersey, USA: IEEE, 2020: 1571-1580.
[28]	REZATOFIGHI H, TSOI N, GWAK J, et al. Generalized intersection over union: A metric and a loss for bounding box regression[C]// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway, New Jersey, USA: IEEE, 2019: 658-666.
[29]	ZHANG Y F, REN W Q, ZHANG Z, et al. Focal and efficient IOU loss for accurate bounding box regression[J]. Neurocomputing, 2022, 506: 146-157.

种类	标签名称	菌盖直径	菌盖特征	是否采摘
花香菇	flower	>=4 cm	呈圆形或椭圆形状，且菌盖开裂形成花状或网状图案	是
光帽香菇	smooth cap	>=4 cm	呈圆形或椭圆形状，且表面光滑没有裂纹	是
残缺香菇	defective	>=4 cm	呈不规则形状，或菌盖表面凹陷，或菌盖破损	是
未成熟香菇	immature	<4 cm	不予判断	否

试验	FasterNet	SFFN	EIoU	准确率/%	召回率/%	mAP50/%	模型参数量/M	浮点运算次数FLOPs/ G
1				92.8	90.8	93.3	19.8	57.0
2	√			92.6	91.2	93.9	15.5	51.3
3		√		93.8	92.2	94.3	23.4	65.2
4			√	93.1	91.1	93.7	19.8	57.0
5	√	√		94.3	92.6	94.2	19.1	57.7
6	√		√	93.7	91.7	93.5	15.5	51.3
7		√	√	94.5	92.3	94.6	23.4	65.2
8	√	√	√	95.8	93.1	95.3	19.1	53.6

类别	RT-DETR	RT-DETR + SFFN	FSE-DETR
光帽香菇	94.6	95.4	95.5
未成熟香菇	92.7	94.6	95.1
花香菇	93.5	94.1	95.7
残缺香菇	92.4	93.3	94.9

模型	mAP50/%	模型参数量/M	浮点运算次数FLOPs/ G
Faster R-CNN	85.4	41.56	212.9
YOLOv7	92.8	37.21	105.3
YOLOv8m	92.3	25.90	79.3
YOLOv12m	93.7	20.11	68.5
FSE-DETR	95.3	19.1	53.6