欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture

• •    

基于改进UperNet的结球甘蓝叶球识别方法

朱轶萍1,2(), 吴华瑞1,2,3,4(), 郭旺2,3,4, 吴小燕2   

  1. 1. 江苏大学 计算机科学与通信工程学院,江苏 镇江 212013,中国
    2. 国家农业信息化工程技术研究中心,北京 100097,中国
    3. 北京市农林科学院信息技术研究中心,北京 100097,中国
    4. 农业农村部数字乡村技术重点实验室,北京 100097,中国
  • 收稿日期:2023-01-17 出版日期:2024-03-08
  • 作者简介:

    朱轶萍,研究方向为深度学习、计算机视觉。Email:

    ZHU Yiping, E-mail:

  • 通信作者:
    吴华瑞,博士,研究员,研究方向为农业智能系统、农业大数据智能服务。E-mail:

Identification Method of Kale Leaf Ball Based on Improved UperNet

ZHU Yiping1,2(), WU Huarui1,2,3,4(), GUO Wang2,3,4, WU Xiaoyan2   

  1. 1. School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China
    2. National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
    3. Information Technology Research Center, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100097, China
    4. Key Laboratory of Digital Village Technology, Ministry of Agriculture and Rural Affairs, Beijing 100097, China
  • Received:2023-01-17 Online:2024-03-08
  • corresponding author:
    WU Huarui, E-mail:
  • Supported by:
    National Key Research and Development Programme(2022YFD1600602); Ministry of Finance and Ministry of Agriculture and Rural Development: Funding for the National Modern Agricultural Industry Technology System(CARS-23-D07)

摘要:

目的/意义 叶球是结球甘蓝的重要部分,其生长发育对田间管理至关重要。针对叶球分割识别存在大田背景复杂、光照不均匀和叶片纹理相似等问题,提出一种语义分割算法UperNet-ESA,旨在能快速、准确地分割田间场景中结球甘蓝的外叶和叶球,以实现田间结球甘蓝的智能化管理。 方法 首先,采用统一感知解析网络(Unified Perceptual Parsing Network, UperNet)作为高效语义分割框架,将主干网络改为先进的ConvNeXt,使得模型在提升分割精度的同时也能具有较低的模型复杂度;其次,利用高效通道注意力机制(Efficient Channel Attention, ECA)融入特征提取网络的各阶段,进一步捕捉图像的细节信息;最后,通过将特征选择模块(Feature Selection Model, FSM)和特征对齐模块(Feature Alignment Model, FAM)集成到特征金字塔框架中,得到更为精确的目标边界预测结果。 结果和讨论 在自制结球甘蓝图像数据集上进行实验,与目前主流的UNet、PSPNet和DeeplabV3+语义分割模型相比,改进UperNet方法的平均交并比为92.45%,平均像素准确率为94.32%,推理速度为16.6 f/s,能够达到最佳精度-速度平衡效果。 结论 研究成果可为结球甘蓝生长智能化监测提供理论参考,对甘蓝产业发展具有重要的应用前景。

关键词: 结球甘蓝, 语义分割, 叶球识别, 注意力机制, 特征选择, 特征对齐

Abstract:

Objective Kale is an important bulk vegetable crop in the world, and its main growth characteristics are outer leaves and leaf bulbs, while the leaf bulb traits of kale are crucial for the adjustment of water and fertilizer parameters in the field and the final yield. However, in practical field conditions, factors such as soil quality, light exposure, leaf overlap, and shading can affect kale growth. The similarity in color and texture between the leaf bulbs and outer leaves complicates the segmentation process for existing recognition models. In this paper, the segmentation of kale outer leaves and leaf bulbs was proposed under the complex background in the field, using pixel values to determine leaf bulb size for intelligent field management. A semantic segmentation algorithm UperNet-ESA was proposed to be able to quickly and accurately segment the outer leaf and leaf bulb of nodular kale in field scenes using the morphological features of the leaf bulb and outer leaf of nodular kale, in order to realize the intelligent management of nodular kale in the field. Methods In this research, the UperNet-ESA semantic segmentation algorithm, which used the unified perceptual parsing network (UperNet) as an efficient semantic segmentation framework, was more suitable for extracting features of crops in complex environments as it could integrate semantic information from different scales. The backbone network was improved using ConvNeXt, responsible for feature extraction in the model. Due to the challenges posed by the similarity between kale leaf bulbs and outer leaves, as well as leaf overlap hindering accurate target contour localization, the baseline network struggled with low accuracy. ConvNeXt effectively combines the strengths of convolutional neural networks (CNN) and Transformers, building upon ResNet50 and incorporating design principles from Swin Transformer, resulting in a highly effective network structure. The simplicity of the ConvNeXt design enhances segmentation accuracy with minimal model complexity, making it a top-performing CNN. In this study, the ConvNeXt-B version was chosen based on the computational complexity and background characteristics of the knotweed kale image dataset. To enhance the model's perceptual acuity, the block ratios for each stage were set at 3:3:27:3 and the channel numbers at 128, 256, 512 and 1 024, respectively. Given the visual similarity between kale leaf bulbs and outer leaves, a high-efficiency channel attention mechanism was integrated into the backbone network to improve feature extraction in the leaf bulb region. By incorporating attention weights into feature mapping via residual inversion, attention parameters were cyclically trained within each block. This iterative process generated feature maps with attentional weights, facilitating the repeated training of attentional parameters and enhancing the capture of global feature information. Addressing issues arising from direct pixel addition between up-sampling and local features, which led to misaligned context in feature maps and potential erroneous classifications at kale leaf boundaries, a feature alignment module and feature selection module were introduced into the feature pyramid network to refine target boundary information extraction and enhance model segmentation. Results and Discussions The UperNet-ESA semantic segmentation model outperforms the current mainstream UNet model, PSPNet model, DeepLabV3+ model in terms of segmentation accuracy, where mIoU and mPAreached 92.45% and 94.32%, respectively, and the inference speed could reach up to 16.6 fps. MPA was better than that of the UNet model, the PSPNet model, the ResNet-50 based, MobilenetV2, and DeepLabV3+ model with Xception as the backbone. mPA values were improved by 11.52%, 13.56%, 8.68%, 4.31%, and 6.21%, respectively. Similarly, the mIoU were 12.21%, 13.04%, 10.65%, 3.26% and 7.11% higher than that of the UNet-based model, the PSPNet model, and the DeepLabV3+ model based on the ResNet-50, MobilenetV2, and Xception backbones, respectively. The main reason was that the introduction of the ECA module and the improvement of the feature pyramid network in this model strengthen the judgement of the target features at each stage to obtain effective global contextual information. In addition, although the PSPNet model had the fastest inference speed, the overall accuracy was too low to be suitable for constructing kale semantic segmentation models. The inference speed of this model was faster than that of all other network models. In the comprehensive analysis, this model could spend as little inference time as possible under the premise of guaranteeing the accuracy of the model in order to achieve a balance between the recognition accuracy and the recognition speed of kale leaf bulb, so as to provide a theoretical basis for intelligent field management. The UperNet-ESA semantic segmentation model proposed demonstrates superior performance compared to the current mainstream UNet model, PSPNet model, and DeepLabV3+ model in terms of segmentation accuracy. It achieved mIoU and mPA scores of 92.45% and 94.32% respectively, with an impressive inference speed of up to 16.6 f/s. The mPA outperformed those of the UNet model, PSPNet model, ResNet-50 based model, MobilenetV2, and DeepLabV3+ model with Xception as the backbone, showing improvements of 11.52%, 13.56%, 8.68%, 4.31%, and 6.21% respectively. Similarly, the mIoU scores were notably higher by 12.21%, 13.04%, 10.65%, 3.26%, and 7.11% compared to the mIoU scores of the UNet-based model, PSPNet model, DeepLabV3+ model based on the ResNet-50, MobilenetV2, and Xception backbones. This performance enhancement can be attributed to the incorporation of the ECA module and enhancements to the feature pyramid network in this model, which bolster the identification of target features at each stage, enabling the extraction of effective global contextual information. Despite the PSPNet model's faster inference speed, its overall accuracy was insufficient for constructing precise kale semantic segmentation models. In contrast, this model exhibited superior inference speed compared to all other network models. Conclusions The experimental results show that the UperNet-ESA semantic segmentation model proposed in this study outperforms the original network in terms of performance; the improved model in this paper achieves the best accuracy-speed balance compared to the current mainstream semantic segmentation networks.Through comprehensive analysis, this model could lay a foundation for intelligent field management. In the upcoming research, the current model will be further optimized and enhanced, while the kale dataset will be expanded to encompass a greater variety of samples of nodulated kale leaf bulbs. This expansion aims to offer more robust and comprehensive theoretical underpinning for intelligent kale field management.

Key words: kale, semantic segmentation, leafball identification, attention mechanism, feature selection, feature alignment