Welcome to Smart Agriculture 中文

Smart Agriculture ›› 2026, Vol. 8 ›› Issue (1): 167-177.doi: 10.12133/j.smartag.SA202509005

• Information Processing and Decision Making • Previous Articles    

Object Detection Method of Maize Ears Within Canopy Based on CornYOLO

GAO Guangfu1(), WANG Qilei2, SONG Liwen1, FENG Haikuan3, SHI Lei1, YANG Hao3, LIU Yang1, YUE Jibo1()   

  1. 1. College of Information and Management Science, Henan Agricultural University, Zhengzhou 450002, China
    2. Henan Jinyuan Seed Industry Co. , Ltd. , Zhengzhou 450002, China
    3. Key Laboratory of Quantitative Remote Sensing in Agriculture, Ministry of Agriculture and Rural Affairs/ Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
  • Received:2025-09-02 Online:2026-01-30
  • Foundation items:Natural Science Foundation of Henan(252300421839); National Natural Science Foundation(42101362)
  • About author:

    GAO Guangfu, E-mail:

  • corresponding author:
    YUE Jibo, E-mail:

Abstract:

[Objective] As a major grain crop, maize plays a critical role in global food security. The ears of maize serves as a key phenotypic trait, providing essential information on the plant's physiological and agronomic status. Its morphological characteristics, size, and color effectively reflect the plant's growth status and potential yield. Therefore, accurately acquiring images of maize ears in the field across different growth stages is crucial for breeding research and yield prediction. Traditional field detection of maize ears relies heavily on manual labor, which is not only inefficient and labor-intensive but also struggles to meet the high-throughput demands of modern precision breeding programs. There is an urgent need for efficient, automated detection technologies that can operate reliably under real-world field conditions. To address the requirement for efficient acquisition of maize ears phenotypic traits in field breeding work, the objective of this research is to develop a robust object detection solution suitable for large-scale field environments. An improved CornYOLO model based on the YOLO11n (You Only Look Once) architecture was designed to enhance the detection accuracy and efficiency of maize ears in complex field environments. [Methods] Image data were acquired using an unmanned ground vehicle (UGV) equipped with a high-resolution panoramic camera, which traversed multiple experimental plots under varying lighting and growth conditions. A dataset containing 1 152 annotated samples was constructed, covering diverse ear morphologies and occlusion scenarios. Dynamic data augmentation techniques were applied during training to enhance the model's generalization capability. Three key enhancements were introduced to the YOLO11n detection framework. First, a cross stage partial network with dynamic pointwise spatial attention (C2PDA) module was designed to replace the cross stage partial with pointwise spatial attention (C2PSA) module in the YOLO11 backbone network. This module enhanced spatial discriminability and channel sensitivity in feature representation through the collaborative integration of a dynamic channel weighting mechanism and position-aware modeling. It significantly improves the model's performance in identifying maize ears under challenging field conditions such as occlusion of stems and leaves and multi-scale target distribution. Second, the spatial pyramid pooling-fast (SPPF) module in the original model was replaced with an feature refinement module (FRM ) to optimize multi-scale feature fusion. The FRM functions via directional feature decomposition and an adaptive attention mechanism. It captures fine-grained spatial structural information through horizontal and vertical bidirectional pooling and combines spatial-channel cooperative attention for dynamic feature calibration, thereby improving recognition accuracy across varying ear sizes and complex backgrounds. Finally, the unified intersection over union (UIoU) loss function was introduced to optimize bounding box regression accuracy. UIoU is an innovative loss function that emphasizes weight allocation among prediction boxes of different qualities. It adaptively adjusted the weight of each prediction box's loss term based on the IoU value or its monotonic function, assigning higher weights to lower-quality predictions to prioritize their optimization, while reducing weights for high-quality boxes to prevent over-optimization. [Results and Discussions] Experimental results demonstrate that CornYOLO achieved a mAP@50 of 89.3% on the validation set, with the F1-Score increasing by 2.5 percentage points. Compared to widely used lightweight models including YOLOv3-tiny, YOLOv5n, YOLOv8n, YOLOv10n, real-time detection transformer (RT-DETR) and YOLO13n, CornYOLO showed significantly superior detection performance in complex field environments, with mAP@50 improvements of 2.2, 1.9, 1.8, 5.7, 12.6 and 2.4 percentage points, respectively. These results fully validate that CornYOLO can efficiently and accurately extract maize ear images under field conditions, providing a technical foundation for precise phenotypic evaluation and yield prediction. Furthermore, ablation studies were conducted: Introducing the C2PDA module improved the model's mAP@50 by 0.5 percentage points and the F1-Score by 0.5 percentage points. However, after incorporating the FRM module, which successfully enhanced multi-scale detection performance and increased the F1-Score by 1.5 percentage points, the integration of these two modules resulted in the generation of a small number of low-quality detection boxes. The original loss function was inefficient in optimizing such boxes, resulting in no improvement in mAP@50 after the modification. To address this issue, the UIoU loss function was introduced. By dynamically adjusting weight assignments based on prediction quality, it significantly improved the regression performance for low-quality detection boxes, thereby enhancing the localization accuracy and convergence stability of the model in dense target scenarios. The final CornYOLO model exhibited excellent overall performance: Compared to the original YOLO11n, the F1-Score increased by 2.5 percentage points and mAP@50 improved by 1.1 percentage points. The experimental results fully demonstrate that CornYOLO effectively enhances the detection capability for maize ears in complex field environments compared to the baseline YOLO11n model. [Conclusions] The CornYOLO model proposed in this study incorporates three key components: C2PDA, FRM, and UIoU, which enhances model convergence and localization performance in dense and occluded scenes, enables the model to effectively and precisely identify maize ears under practical conditions, thereby providing reliable technical support for phenotypic analysis and yield prediction in maize breeding. Future work will focus on extending the model to other crop types and further optimizing inference efficiency for real-time deployment on mobile platforms.

Key words: maize ears, unmanned ground vehicle, YOLO11, panoramic camera, object detection

CLC Number: