Welcome to Smart Agriculture 中文

Smart Agriculture

   

Detection Method for Log-Cultivated Shiitake Mushrooms Based on Improved RT-DETR

WANG Fengyun1, WANG Xuanyu2, AN Lei3, FENG Wenjie1()   

  1. 1. Shandong Academy of Agricultural Sciences, Jinan 250100, China
    2. Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250300, China
    3. Dongying Hekou District Administrative Examination and Approval Service Bureau, Dongying 257200, China
  • Received:2025-06-29 Online:2025-09-11
  • Foundation items:Natural Science Foundation of Shandong Province(ZR2022MC067); National Key Research and Development Program of China(2021YFB3901303); Key Technology Research and Development Program of Shandong Province(2022CXGC010610); Agricultural Scientific and Technological Innovation Project of Shandong Academy of Agricultural Sciences(CXGC2024A08)
  • corresponding author:
    FENG Wenjie, E-mail:

Abstract:

[Objective] Shiitake mushroom is one of the most important edible and medicinal fungi in China, and its factory-based cultivation has become a major production model. Although mixing, bagging, sterilization, and inoculation have been largely automated, harvesting and grading still depend heavily on manual labor, which leads to high labor intensity, low efficiency, and inconsistency caused by subjective judgment, thereby restricting large-scale production. Furthermore, the clustered growth pattern of shiitake mushrooms, the high proportion of small targets, severe occlusion, and complex illumination conditions present additional challenges to automated detection. Traditional object detection models often struggle to balance accuracy, robustness, and lightweight efficiency in such environments. Therefore, there is an urgent need for a high-precision and lightweight detection model capable of supporting intelligent evaluation in mushroom harvesting. [Methods] To address these challenges, this study proposed an improved real-time detection model named FSE-DETR, based on the RT-DETR framework. In the backbone, the FasterNet Block was introduced to replace the original HGNetv2 structure. By combining partial convolution (PConv) for efficient channel reduction and pointwise convolution (PWConv) for rapid feature integration, the FasterNet Block reduced redundant computation and parameter size while maintaining effective multi-scale feature extraction, thereby improving both efficiency and deployment feasibility. In the encoder, a small object feature fusion network (SFFN) was designed to enhance the recognition of immature mushrooms and other small targets. This network first applied space-to-depth convolution (SPDConv), which rearranged spatial information into channel dimensions without discarding fine-grained details such as edges and textures. The processed features were then passed through the cross stage partial omni-kernel (CSPOmniKernel) module, which divided feature maps into two parts: one path preserved original information, while the other path underwent multi-scale convolutional operations including 1×1, asymmetric large-kernel, and frequency-domain transformations, before being recombined. This design enabled the model to capture both local structural cues and global semantic context simultaneously, improving its robustness under occlusion and scale variation. For bounding box regression, the Efficient Intersection over Union (EIoU) loss function was adopted to replace GIoU. Unlike GIoU, EIoU explicitly penalized differences in center distance, aspect ratio, and scale between predicted and ground-truth boxes, resulting in more precise localization and faster convergence during training. The dataset was constructed from images collected in mushroom cultivation facilities using fixed-position RGB cameras under diverse illumination conditions, including direct daylight, low-light, and artificial lighting, to ensure realistic coverage. Four mushroom categories were annotated: immature mushrooms, flower mushrooms, smooth cap mushrooms, and defective mushrooms, following industrial grading standards. To address the limited size of raw data and prevent overfitting, extensive augmentation strategies such as horizontal and vertical flipping, random rotation, Gaussian and salt-and-pepper noise addition, and synthetic occlusion were applied. The augmented dataset consisted of 4 000 images, which were randomly divided into training, validation, and test sets at a ratio of 7:2:1, ensuring balanced distribution across all categories. [Results and Discussions] Experimental evaluation was conducted under consistent hardware and hyperparameter settings. The ablation study revealed that FasterNet effectively reduced parameters and computation while slightly improving accuracy, SFFN significantly enhanced the detection of small and occluded mushrooms, and EIoU improved bounding box regression. When integrated, these improvements enabled the final model to achieve an accuracy of 95.8%, a recall of 93.1%, and a mAP50 of 95.3%, with a model size of 19.1 M and a computational cost of 53.6 GFLOPs, thus achieving a favorable balance between precision and efficiency. Compared with mainstream detection models including Faster R-CNN, YOLOv7, YOLOv8m, and YOLOv12m, FSE-DETR consistently outperformed them in terms of accuracy, robustness, and model efficiency. Notably, the mAP for immature and defective mushrooms increased by 2.4 and 2.5 percent point, respectively, compared with the baseline RT-DETR, demonstrating the effectiveness of the SFFN module for small-object detection. Visualization analysis further confirmed that FSE-DETR maintained stable detection performance under different illumination and occlusion conditions, effectively reducing missed detections, false positives, and repeated recognition, while other models exhibited noticeable deficiencies. These results verified the superior robustness and reliability of the proposed model in practical mushroom factory environments. [Conclusions] In summary, the proposed FSE-DETR model integrated the FasterNet Block, Small Object Feature Fusion Network, and EIoU loss into the RT-DETR framework, achieving state-of-the-art accuracy while maintaining lightweight characteristics. The model showed strong adaptability to small targets, occlusion, and complex illumination, making it a reliable solution for intelligent mushroom harvest evaluation. With its balance of precision and efficiency, FSE-DETR demonstrates great potential for deployment in real-world factory production and provides a valuable reference for developing high-performance, lightweight detection models for other agricultural applications.

Key words: shiitake mushroom, harvest evaluation, FSE-DETR, deep learning, object detection

CLC Number: