U-Net Greenhouse Sweet Cherry Image Segmentation Method Integrating PDE Plant Temporal Image Contrastive Learning and GCN Skip Connections

doi:10.12133/j.smartag.SA202502008

Abstract

Abstract:

[Objective] Within the field of plant phenotyping feature extraction, the accurate delineation of small targets boundaries and the adequate recovery of spatial details during upsampling operations have long been recognized as significant obstacles hindering progress. To address these limitations, an improved U-Net architecture designed for greenhouse sweet cherry image segmentation. [Methods] Taking temporal phenotypic images of sweet cherries as the research subject, the U-Net segmentation model was employed to delineate the specific organ regions of the plant. This architecture was referred to as the U-Net integrating self-supervised contrastive learning method for plant time-series images with priori distance embedding (PDE) pre-training and graph convolutional networks (GCN ) skip connection for greenhouse sweet cherry image segmentation. To accelerate model convergence, the pre-trained weights derived from the PDE plant temporal image contrastive learning method were transferred to. Concurrently, the incorporation of a GCN local feature fusion layer was incorporated as a skip connection to optimize feature fusion, thereby providing robust technical support for image segmentation task. The PDE plant temporal image contrastive learning method pre-training required the construction of image pairs corresponding to different phenological periods. A classification distance loss function, which incorporated prior knowledge, was employed to construct an Encoder with adjusted parameters. Pre-trained weights obtained from the PDE plant temporal image contrastive learning method were effectively transferred and and applied to the semantic segmentation task, enabling the network to accurately learn semantic information and detailed textures of various sweet cherry organs. The Encoder module performs multi-scale feature extraction by convolutional and pooling layers. This process enabled the hierarchical processing of the semantic information embedded in the input image to construct representations that progress transitions from low-level texture features to high-level semantic features. This allows consistent extraction of semantic features from images across various scales and abstraction of underlying information, enhancing feature discriminability and optimizing modeling of complex targets. The Decoder module was employed to conduct up sampling operations, which facilitated the integration of features from diverse scales and the restoration of the original image resolution. This enabled the results to effectively reconstruct spatial details and significantly improve the efficiency of model optimization. At the interface between the Encoder and Decoder modules, a GCN layer designed for local feature fusion was strategically integrated as a skip connection, enabling the network to better capture and learn the local features in multi-scale images. [Results and Discussions] Utilizing a set of evaluation metrics including accuracy, precision, recall, and F₁-Score, an in-depth and rigorous assessment of the model's performance capabilities was conducted. The research findings revealed that the improved U-Net model achieved superior performance in semantic segmentation of sweet cherry images, with an accuracy of up to 0.955 0. Ablation experiments results further revealed that the proposed method attained a precision of 0.932 8, a recall of 0.927 4, and an F₁-Score of 0.912 8. The accuracy of improved U-Net is higher by 0.069 9, 0.028 8, and 0.042 compared to the original U-Net, U-Net with PDE plant temporal image contrastive learning method, and U-Net with GCN skip connections, respectively. Meanwhile the F₁-Score is 0.078 3, 0.033 8, and 0.043 8 higher respectively. In comparative experiments against DeepLabV3, Swin Transformer and Segment Anything Model segmentation methods, the proposed model surpassed the above models by 0.022 2, 0.027 6 and 0.042 2 in accuracy; 0.063 7, 0.147 1 and 0.107 7 in precision; 0.035 2, 0.065 4 and 0.050 8 in recall; and 0.076 8, 0.127 5 and 0.103 4 in F₁-Score. [Conclusions] The incorporation of the PDE plant temporal image contrastive learning method and the GCN techniques was utilized to develop an advanced U-Net architecture that is specifically designed and optimized for the analysis of sweat cherry plant phenotyping. The results demonstrate that the proposed method is capable of effectively addressing the issues of boundary blurring and detail loss associated with small targets in complex orchard scenarios. It enables the precise segmentation of the primary organs and background regions in sweet cherry images, thereby enhancing the segmentation accuracy of the original model. This improvement provides a solid foundation for subsequent crop modeling research and holds significant practical importance for the advancement of agricultural intelligence.

Key words: priori distance embedding, transfer learning, GCN, U-Net, skip connection, plant phenotype

CLC Number:

HU Lingyan, GUO Ruiya, GUO Zhanjun, XU Guohui, GAI Rongli, WANG Zumin, ZHANG Yumeng, JU Bowen, NIE Xiaoyu. U-Net Greenhouse Sweet Cherry Image Segmentation Method Integrating PDE Plant Temporal Image Contrastive Learning and GCN Skip Connections[J]. Smart Agriculture, 2025, 7(3): 131-142.

Figures/Tables 10

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Table 1

Fig. 6

Table 2

Fig. 7

Table 3

References 26

[1]	KITZLER F, BARTA N, NEUGSCHWANDTNER R W, et al. WE3DS: An RGB-D image dataset for semantic segmentation in agriculture[J]. Sensors, 2023, 23(5): ID 2713.
[2]	ZHUANG F Z, QI Z Y, DUAN K Y, et al. A comprehensive survey on transfer learning[J]. Proceedings of the IEEE, 2021, 109(1): 43-76.
[3]	ZHOU S L, XU C, XU R, et al. Image recognition model of fraudulent websites based on image leader decision and Inception-V3 transfer learning[J]. China communications, 2024, 21(1): 215-227.
[4]	HOWARD J, RUDER S. Universal language model fine-tuning for text classification[J]. Computer science, 2018, 56(1): 328-339.
[5]	RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge[J]. International journal of computer vision, 2015, 115(3): 211-252.
[6]	SINGH A, KAUR J, SINGH K, et al. Deep transfer learning-based automated detection of blast disease in paddy crop[J]. Signal, image and video processing, 2024, 18(1): 569-577.
[7]	YAN K, GUO X L, JI Z W, et al. Deep transfer learning for cross-species plant disease diagnosis adapting mixed subdomains[J]. IEEE/ACM transactions on computational biology and bioinformatics, 2023, 20(4): 2555-2564.
[8]	CHEN Z K, ZHANG X, CHEN S, et al. A sparse deep transfer learning model and its application for smart agriculture[J]. Wireless communications and mobile computing, 2021, 2021(1): ID 9957067.
[9]	GARCIA-GARCIA A, ORTS-ESCOLANO S, OPREA S, et al. A survey on deep learning techniques for image and video semantic segmentation[J]. Applied soft computing, 2018, 70: 41-65.
[10]	HAIDER RIZVI S M, IMRAN R, MAHMOOD A. Text classification using graph convolutional networks: A comprehensive survey[J]. ACM computing surveys, 2025, 57(8): 1-38.
[11]	MINAEE S, BOYKOV Y, PORIKLI F, et al. Image segmentation using deep learning: A survey[J]. IEEE transactions on pattern analysis and machine intelligence, 2022, 44(7): 3523-3542.
[12]	SHELHAMER E, LONG J, DARRELL T. Fully convolutional networks for semantic segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(4): 640-651.
[13]	RONNEBERGER O, FISCHER P, BROX T. U-Net: Convolutional networks for biomedical image segmentation[C]// Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015. Cham, Germany: Springer International Publishing, 2015: 234-241.
[14]	BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2017, 39(12): 2481-2495.
[15]	PASZKE A, CHAURASIA A, KIM S, et al. ENet: A deep neural network architecture for real-time semantic segmentation[EB/OL]. arXiv: 1606.02147, 2016.
[16]	SIDDIQUE A, TABB A, MEDEIROS H. Self-supervised learning for panoptic segmentation of multiple fruit flower species[J]. IEEE robotics and automation letters, 2022, 7(4): 12387-12394.
[17]	ZHOU H, YANG J Y, ZHANG T T, et al. EAS-CNN: Automatic design of convolutional neural network for remote sensing images semantic segmentation[J]. International journal of remote sensing, 2023, 44(13): 3911-3938.
[18]	XU W, GUO R Y, CHEN P Y, et al. Cherry growth modeling based on Prior Distance Embedding contrastive learning: Pre-training, anomaly detection, semantic segmentation, and temporal modeling[J]. Computers and electronics in agriculture, 2024, 221: ID 108973.
[19]	XU W, HU L Y, GUO R Y, et al. Image segmentation with contrastive learning for plant time-series images with priori distance embedding[C]// 2023 IEEE Smart World Congress (SWC). Piscataway, New Jersey, USA: IEEE, 2023: 1-8.
[20]	ZAFAR A, SABA N, ARSHAD A, et al. Convolutional neural networks: A comprehensive evaluation and benchmarking of pooling layer variants[J]. Symmetry, 2024, 16(11): ID 1516.
[21]	YANG J, MATSUSHITA B, ZHANG H R. Improving building rooftop segmentation accuracy through the optimization of UNet basic elements and image foreground-background balance[J]. ISPRS journal of photogrammetry and remote sensing, 2023, 201: 123-137.
[22]	FAISAL M, LEU J S, DARMAWAN J T. Model selection of hybrid feature fusion for coffee leaf disease classification[J]. IEEE access, 2023, 11: 62281-62291.
[23]	WANG J, ZHOU F, WEN S L, et al. Deep metric learning with angular loss[C]// 2017 IEEE International Conference on Computer Vision (ICCV). Piscataway, New Jersey, USA: IEEE, 2017: 2612-2620.
[24]	董西伟. 有监督和半监督多视图特征学习方法研究[D]. 南京: 南京邮电大学, 2018.
	DONG X W. Study of supervised and semi-supervised multi-view feature learning methods[D]. Nanjing: Nanjing university of posts and telecommunications, 2018.
[25]	WANG D, CHEN X L. Research on feature fusion method based on graph convolutional networks[J]. Applied sciences, 2024, 14(13): ID 5612.
[26]	MENG X B, WANG P F, YAN H R, et al. Multi-graph convolution network with jump connection for event detection[C]// 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI). Piscataway, New Jersey, USA: IEEE, 2019: 744-751.

预测实际	实际正样本	实际负样本
预测正样本	TP	FP
预测负样本	FN	TN

指标	方法
指标	原始U-Net	引入PDE植物时序图像对比学习方法的U-Net	引入GCN跳跃连接的U-Net	融合PDE植物时序图像对比学习方法与GCN跳跃连接的U-Net
准确率	0.885 1	0.926 2	0.913 0	0.955 0
精确率	0.863 7	0.900 0	0.800 0	0.932 8
召回率	0.855 9	0.892 6	0.875 2	0.927 4
F ₁分数	0.833 5	0.879 0	0.869 0	0.912 8

指标	模型
指标	融合PDE植物时序图像对比学习方法和GCN的U-Net	DeepLabV3	Swin Transformer	SAM
准确率	0.955 0	0.932 8	0.927 4	0.912 8
精确率	0.858 7	0.795 0	0.711 6	0.751 0
召回率	0.837 3	0.802 1	0.771 9	0.786 5
F ₁分数	0.882 1	0.805 3	0.754 6	0.778 7