欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture

• •    

基于CGG的育秧盘内密集稻种分割计数方法研究

欧阳盟1, 邹荣1(), 陈进1, 李耀明2, 陈宇航1, 严昊1   

  1. 1. 江苏大学 机械工程学院,江苏 镇江 212013,中国
    2. 江苏大学 农业工程学院,江苏 镇江 212013,中国
  • 收稿日期:2025-07-21 出版日期:2025-10-09
  • 基金项目:
    国家自然科学基金项目(31871528)
  • 作者简介:

    欧阳盟,硕士研究生,研究方向为物体实例分割。E-mail:

  • 通信作者:
    邹 荣,博士、副教授,研究方向为深度学习、人工智能与机器视觉等。E-mail:

CGG-Based Segmentation and Counting of Densely Distributed Rice Seeds in Seedling Trays

OUYANG Meng1, ZOU Rong1(), CHEN Jin1, LI Yaoming2, CHEN Yuhang1, YAN Hao1   

  1. 1. School of Mechanical Engineering, Jiangsu University, Zhenjiang 212013, China
    2. School of Agricultural Engineering, Jiangsu University, Zhenjiang 212013, China
  • Received:2025-07-21 Online:2025-10-09
  • Foundation items:the National Natural Science Foundation of China(31871528)
  • About author:

    OUYANG Meng, E-mail:

  • Corresponding author:
    ZOU Rong, E-mail:

摘要:

【目的/意义】 随着工厂化育秧的快速发展,育秧环节的智能化已成为提升水稻生产效率与品质的关键。育秧盘中穴孔内稻种数量的准确识别,直接影响气振式育苗精密播种装置的作业效率与参数优化。然而,在复杂环境下,穴孔内稻种检测精度低,难以实现单粒精确分割。 【方法】 为此,提出一种图文定位生成(Caption Grounding and Generation, CGG)模型与预训练模型的稻种实例分割方法,通过图像与文本特征的联合对齐,实现目标定位与语义理解的协同学习,从而显著提升了稻种的检测与分割精度。消融实验结果表明,核心改进模块,包括基于自举式语言-图像预训练模型(Bootstrapping Language-Image Pre-training, BLIP)生成的伪标签,以及基于双向编码器表示的 Transformer(Bidirectional Encoder Representations from Transformers, BERT)的词向量嵌入,均能够提升模型性能。二者结合时表现出协同效应,使分割准确率较基线提升超过3个百分点。 【结果和讨论】 所提CGG模型在交并比(Intersection over Union, IoU)阈值为0.5时,边界框检测平均精度(mean Average Precision for bounding box, mAP50bb)达到90.7%,实例分割平均精度(mean Average Precision for segmentation, mAP50seg)达到91.4%,显著优于Mask R-CNN和Mask2Former等主流模型。进一步在播种场景验证中,CGG模型在单穴种子数量检测精度达88%,表现最优。其针对单粒稻种的误差指标,包括均方根误差(Root Mean Square Error, RMSE)=16.8颗、平均绝对误差(Mean Absolute Error, MAE)=13.7颗和平均绝对百分比误差(Mean Absolute Percentage Error, MAPE)=2.46%,均显著低于对比模型。 【结论】 该模型具备实时在线检测稻种数量的能力,为后续需补种穴位的精准再播种提供了可量化操作依据,具有良好的实际应用前景,可为智慧农业和智能播种技术的发展提供有力支持。

关键词: 水稻播种, 智慧农业, 实例分割, 预训练模型, 视觉语言模型

Abstract:

[Objective] The precise quantification of rice seeds within individual cavities of seedling trays constitutes a critical operational parameter for optimizing seeding efficiency and fine-tuning the performance of air-vibration precision seeders. Achieving high accuracy in this task directly impacts resource utilization, seedling uniformity, and ultimately crop yield. However, the operational environment presents significant challenges, including complex backgrounds, seed overlap, variations in lighting and seed orientation, and the inherent difficulty of distinguishing individual seeds within dense clusters. These factors often lead to suboptimal performance in existing automated detection systems, manifesting as low detection accuracy and an inability to achieve robust, precise instance segmentation of individual rice seeds. To address these persistent limitations and advance the state-of-the-art in precision seeding monitoring, a novel, integrated framework for rice seed instance segmentation is proposed. The core innovation lies in the synergistic combination of a cross-modal grounding generation network (CGG) with a pretrained model, designed to leverage complementary information from visual and textual domains. [Methods] The proposed methodology fundamentally aimed to bridge the gap between visual perception and semantic understanding within the specific context of rice seed detection. The CGG-pretrained model framework achieved this through deep joint alignment of visual features extracted from seedling tray images and textual features derived from contextual knowledge. This cross-modal grounding enabled collaborative learning, where the visual processing stream (handled object localization and pixel-level segmentation) was continuously informed and refined by the semantic understanding stream (interpreting context and relationships). Specifically, the visual backbone network processes input imagery to generate feature maps, while the pretrained language model component, which utilized contextual embeddings, generated semantically rich textual representations. The CGG module acted as the fusion engine, establishing explicit correspondences between specific regions in the image (potential seeds or clusters) and relevant semantic concepts or descriptors provided by the pretrained model. This bidirectional interaction significantly enhanced the model's ability to disambiguate overlapping seeds, resolved occlusions, and accurately delineated individual seed boundaries under challenging conditions. Key technical innovations validated through rigorous ablation studies include: (1) The strategic use of the BLIP model for generating high-quality pseudo-labels from unlabeled or weakly labeled image data, facilitating more effective semi-supervised learning and reducing annotation burden; and (2) the application of BERT-based word embed to capture deep semantic relationships and contextual nuances within textual descriptors related to seeds and seeding environments. Crucially, the ablation experiments demonstrated a pronounced synergistic effect when these two core improvements were combined, resulting in a segmentation accuracy improvement exceeding 3 percentage points compared to the baseline model lacked these integrations. [Results and Discussions] Comprehensive experimental evaluation demonstrated the superior performance of the proposed CGG model against established benchmarks. Under the standard intersection over union (IoU) threshold of 0.5, the model achieved a mean average precision (mAP) of 90.7% for bounding box detection (denoted as mAP50bb for detection) and an outstanding 91.4% mAP for instance segmentation (denoted as mAP50seg for segmentation). These results represented a statistically significant improvement over leading contemporary models, including MaskR-CNN and Mask2Former, which highlighted the efficacy of the cross-modal grounding approach in accurately localizing and segmenting individual rice seeds. Further validation within realistic seeding trial scenarios, which involved direct comparison with meticulous manual annotations, confirmed the model's practical robustness. The CGG model attained the highest accuracy in two critical operational metrics: (1) Precision in segmenting individual seed instances (single-seed segmentation accuracy), and (2) accuracy in determining the exact seed count per cavity, and it achieved an average accuracy of 88% for per-cavity quantification. Moreover, the model exhibited superior performance in minimizing estimation errors for cavity seed counts, which was evidenced by its significantly lower error metrics: a root mean square error (RMSE = 16.8 seeds), a mean absolute error (MAE = 13.7 seeds), and a mean absolute percentage error (MAPE = 2.46%). These error values were markedly lower than those recorded by the comparison models, which underscored the CGG model's enhanced reliability in practical counting tasks. The discussion contextualized these results and attributed the performance gains to the model's ability to leverage semantic context to resolve ambiguities inherent in visual-only approaches, particularly in dense and overlapping seed scenarios common in precision seeding trays. [Conclusions] The developed CGG-pretrained model integration presents a significant advancement in automated monitoring for precision rice seeding. The model successfully addresses the core challenges of low detection accuracy and imprecise instance segmentation in complex environments. Its high accuracy in both individual seed segmentation and per-cavity seed count quantification, coupled with low error rates, demonstrates strong potential for practical deployment. Importantly, the model enables real-time detection of rice seeds during the image analysis stage. This functionality provides a quantifiable, data-driven basis for making immediate operational decisions, most notably enabling the targeted precision reseeding of empty or under-seeded cavities identified during the seeding process. By ensuring optimal seed placement and density from the outset, the technology contributes directly to improved resource efficiency (reducing seed waste), enhanced seedling uniformity, and potentially higher crop yields. Consequently, this research offers valuable and robust technical support for the ongoing advancement of smart agriculture systems and the development of next-generation intelligent seeding machinery, paving the way for more automated, efficient, and sustainable rice production. Future work will focus on further optimizing inference speed for higher-throughput seeding lines and exploring generalization to other crop types and seeding mechanisms.

Key words: rice seeding, smart agriculture, instance segmentation, pretrained model, visual-language model

中图分类号: