欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture ›› 2025, Vol. 7 ›› Issue (3): 131-142.doi: 10.12133/j.smartag.SA202502008

• 信息处理与决策 • 上一篇    

融合PDE植物时序图像对比学习方法与GCN跳跃连接的U-Net温室甜樱桃图像分割方法

胡玲艳1, 郭睿雅1, 郭占俊2, 徐国辉1, 盖荣丽1, 汪祖民1(), 张宇萌1, 鞠博文1, 聂晓宇1()   

  1. 1. 大连大学 信息工程学院,辽宁 大连 116622,中国
    2. 大连市现代农业生产发展服务中心,辽宁 大连 116021,中国
  • 收稿日期:2025-02-11 出版日期:2025-05-30
  • 基金项目:
    辽宁省科技计划重点项目(2022020655-JH1/109); 大连市科技创新基金项目(2022JJ12SN052)
  • 作者简介:

    胡玲艳,博士,副教授,研究方向为智慧农业/智能感知与数据分析。E-mail:

  • 通信作者:
    汪祖民,博士,教授,研究方向为物联网技术与数字农业。E-mail:
    聂晓宇,硕士,助理实验师,研究方向为智慧农业与病虫害诊断。E-mail:

U-Net Greenhouse Sweet Cherry Image Segmentation Method Integrating PDE Plant Temporal Image Contrastive Learning and GCN Skip Connections

HU Lingyan1, GUO Ruiya1, GUO Zhanjun2, XU Guohui1, GAI Rongli1, WANG Zumin1(), ZHANG Yumeng1, JU Bowen1, NIE Xiaoyu1()   

  1. 1. School of Information Engineering, Dalian University, Dalian 116622, China
    2. Dalian Modern Agricultural Production Development Service Center, Dalian 116021, China
  • Received:2025-02-11 Online:2025-05-30
  • Foundation items:Key Projects of Liaoning Provincial Science and technology plan(2022020655-JH1/109); Dalian Science and Technology Innovation Fund Project(2022JJ12SN052)
  • About author:

    YANG Xiao, E-mail:

  • Corresponding author:
    WANG Zumin, E-mail:
    NIE Xiaoyu, E-mail:

摘要:

【目的/意义】 在植物表型特征提取中,面临小目标边界难以精确分割、上采样细节恢复空间信息不足等问题。提出一种融合嵌入先验距离(Priori Distance Embedding, PDE)植物时序图像对比学习方法,预训练与图卷积网络(Graph Convolutional Networks, GCN)跳跃连接的U-Net温室甜樱桃图像分割方法,借助预训练加速模型收敛,优化特征融合,为图像分割提供技术支持。 【方法】 将PDE植物时序图像对比学习方法的预训练权重迁移至语义分割任务;Encoder模块通过卷积-池化层执行多尺度特征提取,分层输入图像的语义信息,构建从低层纹理到高层语义的表示;利用Decoder模块进行上采样操作,融合不同尺度特征并恢复图像分辨率;Encoder和Decoder连接处,加入GCN,形成跳跃连接,使网络更容易学习多尺度图像的局部特征。 【结果和讨论】 从纵向消融实验和横向对比多角度进行试验,并结合准确率、召回率、F1分数等评价指标综合分析,可以验证本研究提出的融合PDE植物时序图像对比学习方法与GCN跳跃连接的U-Net在甜樱桃图像语义分割中的性能表现最佳,准确率可达0.955 0。 【结论】 通过将PDE植物时序图像对比学习方法和GCN技术融合,构建面向植物表型分析的增强型U-Net架构。研究结果表明该方法在复杂场景下能有效解决小目标边界模糊、细节丢失等难题,实现对甜樱桃图像主要器官和背景区域的精确分割,提高原始模型的分割准度,对农业智慧化发展具有重要的实践意义。

关键词: 嵌入先验距离, 迁移学习, 图卷积网络, U-Net, 跳跃连接, 植物表型

Abstract:

[Objective] Within the field of plant phenotyping feature extraction, the accurate delineation of small targets boundaries and the adequate recovery of spatial details during upsampling operations have long been recognized as significant obstacles hindering progress. To address these limitations, an improved U-Net architecture designed for greenhouse sweet cherry image segmentation. [Methods] Taking temporal phenotypic images of sweet cherries as the research subject, the U-Net segmentation model was employed to delineate the specific organ regions of the plant. This architecture was referred to as the U-Net integrating self-supervised contrastive learning method for plant time-series images with priori distance embedding (PDE) pre-training and graph convolutional networks (GCN ) skip connection for greenhouse sweet cherry image segmentation. To accelerate model convergence, the pre-trained weights derived from the PDE plant temporal image contrastive learning method were transferred to. Concurrently, the incorporation of a GCN local feature fusion layer was incorporated as a skip connection to optimize feature fusion, thereby providing robust technical support for image segmentation task. The PDE plant temporal image contrastive learning method pre-training required the construction of image pairs corresponding to different phenological periods. A classification distance loss function, which incorporated prior knowledge, was employed to construct an Encoder with adjusted parameters. Pre-trained weights obtained from the PDE plant temporal image contrastive learning method were effectively transferred and and applied to the semantic segmentation task, enabling the network to accurately learn semantic information and detailed textures of various sweet cherry organs. The Encoder module performs multi-scale feature extraction by convolutional and pooling layers. This process enabled the hierarchical processing of the semantic information embedded in the input image to construct representations that progress transitions from low-level texture features to high-level semantic features. This allows consistent extraction of semantic features from images across various scales and abstraction of underlying information, enhancing feature discriminability and optimizing modeling of complex targets. The Decoder module was employed to conduct up sampling operations, which facilitated the integration of features from diverse scales and the restoration of the original image resolution. This enabled the results to effectively reconstruct spatial details and significantly improve the efficiency of model optimization. At the interface between the Encoder and Decoder modules, a GCN layer designed for local feature fusion was strategically integrated as a skip connection, enabling the network to better capture and learn the local features in multi-scale images. [Results and Discussions] Utilizing a set of evaluation metrics including accuracy, precision, recall, and F1-Score, an in-depth and rigorous assessment of the model's performance capabilities was conducted. The research findings revealed that the improved U-Net model achieved superior performance in semantic segmentation of sweet cherry images, with an accuracy of up to 0.955 0. Ablation experiments results further revealed that the proposed method attained a precision of 0.932 8, a recall of 0.927 4, and an F1-Score of 0.912 8. The accuracy of improved U-Net is higher by 0.069 9, 0.028 8, and 0.042 compared to the original U-Net, U-Net with PDE plant temporal image contrastive learning method, and U-Net with GCN skip connections, respectively. Meanwhile the F1-Score is 0.078 3, 0.033 8, and 0.043 8 higher respectively. In comparative experiments against DeepLabV3, Swin Transformer and Segment Anything Model segmentation methods, the proposed model surpassed the above models by 0.022 2, 0.027 6 and 0.042 2 in accuracy; 0.063 7, 0.147 1 and 0.107 7 in precision; 0.035 2, 0.065 4 and 0.050 8 in recall; and 0.076 8, 0.127 5 and 0.103 4 in F1-Score. [Conclusions] The incorporation of the PDE plant temporal image contrastive learning method and the GCN techniques was utilized to develop an advanced U-Net architecture that is specifically designed and optimized for the analysis of sweat cherry plant phenotyping. The results demonstrate that the proposed method is capable of effectively addressing the issues of boundary blurring and detail loss associated with small targets in complex orchard scenarios. It enables the precise segmentation of the primary organs and background regions in sweet cherry images, thereby enhancing the segmentation accuracy of the original model. This improvement provides a solid foundation for subsequent crop modeling research and holds significant practical importance for the advancement of agricultural intelligence.

Key words: priori distance embedding, transfer learning, GCN, U-Net, skip connection, plant phenotype

中图分类号: