Welcome to Smart Agriculture 中文

Smart Agriculture

   

CGG-Based Segmentation and Counting of Densely Distributed Rice Seeds in Seedling Trays

OUYANG Meng1, ZOU Rong1(), CHEN Jin1, LI Yaoming2, CHEN Yuhang1, YAN Hao1   

  1. 1. School of Mechanical Engineering, Jiangsu University, Zhenjiang 212013, China
    2. School of Agricultural Engineering, Jiangsu University, Zhenjiang 212013, China
  • Received:2025-07-21 Online:2025-10-09
  • Foundation items:the National Natural Science Foundation of China(31871528)
  • About author:

    OUYANG Meng, E-mail:

  • corresponding author:
    ZOU Rong, E-mail:

Abstract:

[Objective] The precise quantification of rice seeds within individual cavities of seedling trays constitutes a critical operational parameter for optimizing seeding efficiency and fine-tuning the performance of air-vibration precision seeders. Achieving high accuracy in this task directly impacts resource utilization, seedling uniformity, and ultimately crop yield. However, the operational environment presents significant challenges, including complex backgrounds, seed overlap, variations in lighting and seed orientation, and the inherent difficulty of distinguishing individual seeds within dense clusters. These factors often lead to suboptimal performance in existing automated detection systems, manifesting as low detection accuracy and an inability to achieve robust, precise instance segmentation of individual rice seeds. To address these persistent limitations and advance the state-of-the-art in precision seeding monitoring, a novel, integrated framework for rice seed instance segmentation is proposed. The core innovation lies in the synergistic combination of a cross-modal grounding generation network (CGG) with a pretrained model, designed to leverage complementary information from visual and textual domains. [Methods] The proposed methodology fundamentally aimed to bridge the gap between visual perception and semantic understanding within the specific context of rice seed detection. The CGG-pretrained model framework achieved this through deep joint alignment of visual features extracted from seedling tray images and textual features derived from contextual knowledge. This cross-modal grounding enabled collaborative learning, where the visual processing stream (handled object localization and pixel-level segmentation) was continuously informed and refined by the semantic understanding stream (interpreting context and relationships). Specifically, the visual backbone network processes input imagery to generate feature maps, while the pretrained language model component, which utilized contextual embeddings, generated semantically rich textual representations. The CGG module acted as the fusion engine, establishing explicit correspondences between specific regions in the image (potential seeds or clusters) and relevant semantic concepts or descriptors provided by the pretrained model. This bidirectional interaction significantly enhanced the model's ability to disambiguate overlapping seeds, resolved occlusions, and accurately delineated individual seed boundaries under challenging conditions. Key technical innovations validated through rigorous ablation studies include: (1) The strategic use of the BLIP model for generating high-quality pseudo-labels from unlabeled or weakly labeled image data, facilitating more effective semi-supervised learning and reducing annotation burden; and (2) the application of BERT-based word embed to capture deep semantic relationships and contextual nuances within textual descriptors related to seeds and seeding environments. Crucially, the ablation experiments demonstrated a pronounced synergistic effect when these two core improvements were combined, resulting in a segmentation accuracy improvement exceeding 3 percentage points compared to the baseline model lacked these integrations. [Results and Discussions] Comprehensive experimental evaluation demonstrated the superior performance of the proposed CGG model against established benchmarks. Under the standard intersection over union (IoU) threshold of 0.5, the model achieved a mean average precision (mAP) of 90.7% for bounding box detection (denoted as mAP50bb for detection) and an outstanding 91.4% mAP for instance segmentation (denoted as mAP50seg for segmentation). These results represented a statistically significant improvement over leading contemporary models, including MaskR-CNN and Mask2Former, which highlighted the efficacy of the cross-modal grounding approach in accurately localizing and segmenting individual rice seeds. Further validation within realistic seeding trial scenarios, which involved direct comparison with meticulous manual annotations, confirmed the model's practical robustness. The CGG model attained the highest accuracy in two critical operational metrics: (1) Precision in segmenting individual seed instances (single-seed segmentation accuracy), and (2) accuracy in determining the exact seed count per cavity, and it achieved an average accuracy of 88% for per-cavity quantification. Moreover, the model exhibited superior performance in minimizing estimation errors for cavity seed counts, which was evidenced by its significantly lower error metrics: a root mean square error (RMSE = 16.8 seeds), a mean absolute error (MAE = 13.7 seeds), and a mean absolute percentage error (MAPE = 2.46%). These error values were markedly lower than those recorded by the comparison models, which underscored the CGG model's enhanced reliability in practical counting tasks. The discussion contextualized these results and attributed the performance gains to the model's ability to leverage semantic context to resolve ambiguities inherent in visual-only approaches, particularly in dense and overlapping seed scenarios common in precision seeding trays. [Conclusions] The developed CGG-pretrained model integration presents a significant advancement in automated monitoring for precision rice seeding. The model successfully addresses the core challenges of low detection accuracy and imprecise instance segmentation in complex environments. Its high accuracy in both individual seed segmentation and per-cavity seed count quantification, coupled with low error rates, demonstrates strong potential for practical deployment. Importantly, the model enables real-time detection of rice seeds during the image analysis stage. This functionality provides a quantifiable, data-driven basis for making immediate operational decisions, most notably enabling the targeted precision reseeding of empty or under-seeded cavities identified during the seeding process. By ensuring optimal seed placement and density from the outset, the technology contributes directly to improved resource efficiency (reducing seed waste), enhanced seedling uniformity, and potentially higher crop yields. Consequently, this research offers valuable and robust technical support for the ongoing advancement of smart agriculture systems and the development of next-generation intelligent seeding machinery, paving the way for more automated, efficient, and sustainable rice production. Future work will focus on further optimizing inference speed for higher-throughput seeding lines and exploring generalization to other crop types and seeding mechanisms.

Key words: rice seeding, smart agriculture, instance segmentation, pretrained model, visual-language model

CLC Number: