欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture

• •    

基于改进RT-DETR的菌棒栽培香菇检测方法

王风云1, 王轩宇2, 安磊3, 封文杰1()   

  1. 1. 山东省农业科学院,山东 济南 250100,中国
    2. 齐鲁工业大学(山东省科学院)计算机科学与技术学部,山东 济南 250300,中国
    3. 东营市河口区行政审批服务局,山东 东营 257200,中国
  • 收稿日期:2025-06-29 出版日期:2025-09-11
  • 基金项目:
    山东省自然科学基金面上项目(ZR2022MC067); 国家重点研发计划(2021YFB3901303); 山东省重点研发计划(重大科技创新工程)(2022CXGC010610); 山东省农业科学院农业科技创新工程(CXGC2024A08)
  • 作者简介:

    王风云,硕士,研究员,研究方向为智慧农业。E-mail:

    WANG Fengyun, E-mail:

  • 通信作者:
    封文杰,研究员,研究方向为农业信息化。E-mail:

Detection Method for Log-Cultivated Shiitake Mushrooms Based on Improved RT-DETR

WANG Fengyun1, WANG Xuanyu2, AN Lei3, FENG Wenjie1()   

  1. 1. Shandong Academy of Agricultural Sciences, Jinan 250100, China
    2. Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250300, China
    3. Dongying Hekou District Administrative Examination and Approval Service Bureau, Dongying 257200, China
  • Received:2025-06-29 Online:2025-09-11
  • Foundation items:Natural Science Foundation of Shandong Province(ZR2022MC067); National Key Research and Development Program of China(2021YFB3901303); Key Technology Research and Development Program of Shandong Province(2022CXGC010610); Agricultural Scientific and Technological Innovation Project of Shandong Academy of Agricultural Sciences(CXGC2024A08)
  • Corresponding author:
    FENG Wenjie, E-mail:

摘要:

[目的/意义] 随着计算机视觉与自动化技术在香菇工厂化生产中的深入应用,拌料、装袋、灭菌、接种等环节已基本实现自动化,而采摘与分级仍高度依赖人工,成为制约产业效率的关键环节。为提升香菇采收阶段的智能化水平,亟需构建高精度、轻量化的目标检测模型。 [方法] 提出了一种基于改进RT-DETR(Real-Time DEtection TRansformer)的香菇采收评价模型——FSE-DETR。该模型在主干网络中引入FasterNet Block以降低计算复杂度,并在特征编码阶段设计了小目标特征融合网络(Small Object Feature Fusion Network, SFFN),通过空间到深度卷积(Space-to-depth Conv, SPDConv)保留细粒度空间信息,结合跨阶段全核模块(Cross Stage Partial Omni-Kernel Module, CSPOmniKernel)实现多尺度特征提取与全局上下文建模;同时采用高效交并比(Efficient IoU, EIoU)损失函数优化边界框定位精度与收敛速度。 [结果和讨论] FSE-DETR在检测精度和模型效率方面均优于Faster R-CNN(Faster Region-based Convolutional Neural Network)、YOLOv7、YOLOv8m和YOLOv12m等主流模型,在小目标、密集遮挡和低光照条件下表现更加稳定。模型最终准确率达95.8%,召回率为93.1%,平均精度均值为95.3%,同时具备良好的计算效率,参数量为19.1 M,FLOPs为53.6 G,展现出优异的实用性与部署潜力。 [结论] FSE-DETR在保持高检测精度的同时实现了轻量化与高效率,能够为香菇工厂化生产中的采收评价提供可靠的技术支持。

关键词: 香菇, 采收评价, FSE-DETR, 深度学习, 目标检测

Abstract:

[Objective] Shiitake mushroom is one of the most important edible and medicinal fungi in China, and its factory-based cultivation has become a major production model. Although mixing, bagging, sterilization, and inoculation have been largely automated, harvesting and grading still depend heavily on manual labor, which leads to high labor intensity, low efficiency, and inconsistency caused by subjective judgment, thereby restricting large-scale production. Furthermore, the clustered growth pattern of shiitake mushrooms, the high proportion of small targets, severe occlusion, and complex illumination conditions present additional challenges to automated detection. Traditional object detection models often struggle to balance accuracy, robustness, and lightweight efficiency in such environments. Therefore, there is an urgent need for a high-precision and lightweight detection model capable of supporting intelligent evaluation in mushroom harvesting. [Methods] To address these challenges, this study proposed an improved real-time detection model named FSE-DETR, based on the RT-DETR framework. In the backbone, the FasterNet Block was introduced to replace the original HGNetv2 structure. By combining partial convolution (PConv) for efficient channel reduction and pointwise convolution (PWConv) for rapid feature integration, the FasterNet Block reduced redundant computation and parameter size while maintaining effective multi-scale feature extraction, thereby improving both efficiency and deployment feasibility. In the encoder, a small object feature fusion network (SFFN) was designed to enhance the recognition of immature mushrooms and other small targets. This network first applied space-to-depth convolution (SPDConv), which rearranged spatial information into channel dimensions without discarding fine-grained details such as edges and textures. The processed features were then passed through the cross stage partial omni-kernel (CSPOmniKernel) module, which divided feature maps into two parts: one path preserved original information, while the other path underwent multi-scale convolutional operations including 1×1, asymmetric large-kernel, and frequency-domain transformations, before being recombined. This design enabled the model to capture both local structural cues and global semantic context simultaneously, improving its robustness under occlusion and scale variation. For bounding box regression, the Efficient Intersection over Union (EIoU) loss function was adopted to replace GIoU. Unlike GIoU, EIoU explicitly penalized differences in center distance, aspect ratio, and scale between predicted and ground-truth boxes, resulting in more precise localization and faster convergence during training. The dataset was constructed from images collected in mushroom cultivation facilities using fixed-position RGB cameras under diverse illumination conditions, including direct daylight, low-light, and artificial lighting, to ensure realistic coverage. Four mushroom categories were annotated: immature mushrooms, flower mushrooms, smooth cap mushrooms, and defective mushrooms, following industrial grading standards. To address the limited size of raw data and prevent overfitting, extensive augmentation strategies such as horizontal and vertical flipping, random rotation, Gaussian and salt-and-pepper noise addition, and synthetic occlusion were applied. The augmented dataset consisted of 4 000 images, which were randomly divided into training, validation, and test sets at a ratio of 7:2:1, ensuring balanced distribution across all categories. [Results and Discussions] Experimental evaluation was conducted under consistent hardware and hyperparameter settings. The ablation study revealed that FasterNet effectively reduced parameters and computation while slightly improving accuracy, SFFN significantly enhanced the detection of small and occluded mushrooms, and EIoU improved bounding box regression. When integrated, these improvements enabled the final model to achieve an accuracy of 95.8%, a recall of 93.1%, and a mAP50 of 95.3%, with a model size of 19.1 M and a computational cost of 53.6 GFLOPs, thus achieving a favorable balance between precision and efficiency. Compared with mainstream detection models including Faster R-CNN, YOLOv7, YOLOv8m, and YOLOv12m, FSE-DETR consistently outperformed them in terms of accuracy, robustness, and model efficiency. Notably, the mAP for immature and defective mushrooms increased by 2.4 and 2.5 percent point, respectively, compared with the baseline RT-DETR, demonstrating the effectiveness of the SFFN module for small-object detection. Visualization analysis further confirmed that FSE-DETR maintained stable detection performance under different illumination and occlusion conditions, effectively reducing missed detections, false positives, and repeated recognition, while other models exhibited noticeable deficiencies. These results verified the superior robustness and reliability of the proposed model in practical mushroom factory environments. [Conclusions] In summary, the proposed FSE-DETR model integrated the FasterNet Block, Small Object Feature Fusion Network, and EIoU loss into the RT-DETR framework, achieving state-of-the-art accuracy while maintaining lightweight characteristics. The model showed strong adaptability to small targets, occlusion, and complex illumination, making it a reliable solution for intelligent mushroom harvest evaluation. With its balance of precision and efficiency, FSE-DETR demonstrates great potential for deployment in real-world factory production and provides a valuable reference for developing high-performance, lightweight detection models for other agricultural applications.

Key words: shiitake mushroom, harvest evaluation, FSE-DETR, deep learning, object detection

中图分类号: