欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture ›› 2023, Vol. 5 ›› Issue (4): 137-149.doi: 10.12133/j.smartag.SA202310003

• 专题--面向智慧农业的人工智能和机器人技术 • 上一篇    下一篇

融合VoVNetv2和置换注意力机制的鱼群摄食图像分割方法

王鹤榕1,3,4,5(), 陈英义1,3,4,5, 柴莹倩1,3,4,5, 徐玲1,3,4,5, 于辉辉2,6()   

  1. 1. 中国农业大学 国家数字渔业创新中心,北京 100083,中国
    2. 北京林业大学 信息学院,北京 100083,中国
    3. 农业农村部智慧养殖技术重点实验室,北京 100083,中国
    4. 北京市农业物联网工程技术研究中心,北京 100083,中国
    5. 中国农业大学 信息与电气工程学院,北京 100083,中国
    6. 国家林业和草原局 林业智能信息处理工程技术研究中心,北京 100083,中国
  • 收稿日期:2023-10-07 出版日期:2023-12-30
  • 作者简介:
    王鹤榕,研究方向为计算机科学技术与智能农业的交叉应用。E-mail:

    WANG Herong, E-mail:

  • 通信作者:
    于辉辉,博士,讲师,研究方向为人工智能和农业的交叉应用。E-mail:

Image Segmentation Method Combined with VoVNetv2 and Shuffle Attention Mechanism for Fish Feeding in Aquaculture

WANG Herong1,3,4,5(), CHEN Yingyi1,3,4,5, CHAI Yingqian1,3,4,5, XU Ling1,3,4,5, YU Huihui2,6()   

  1. 1. National Innovation Center for Digital Fishery, China Agricultural University, Beijing 100083, China
    2. School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China
    3. Key Laboratory of Smart Farming Technologies for Aquatic Animal and Livestock, Ministry of Agriculture and Rural Affairs, Beijing 100083, China
    4. Beijing Engineering and Technology Research Centre for Internet of Things in Agriculture, Beijing 100083, China
    5. College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China
    6. Engineering Research Center for Forestry-oriented Intelligent Information Processing, National Forestry and Grassland Administration, Beijing 100083, China
  • Received:2023-10-07 Online:2023-12-30
  • corresponding author:
    YU Huihui, E-mail:
  • Supported by:
    National Natural Science Foundation of China(62206021); Beijing Digital Agriculture Innovation Consortium Project(BAIC10-2023)

摘要:

[目的/意义] 鱼群摄食图像分割是提取鱼群分布特征及量化鱼群摄食行为的前提条件。但在实际的养殖环境中,由于鱼群摄食图像存在鱼群边界模糊、目标相似等问题,使得处于养殖场景下的鱼群摄食图像分割成为难题。 [方法] 为解决上述问题,提出一种用于养殖场景下鱼群摄食图像分割方法。该方法首先通过数据清洗减少因鱼群边界模糊等问题导致的数据集不良标记问题,并在Mask R-CNN(Mask Region-based Convolutional Neural Network)的基础上使用融合置换注意力机制的轻量级神经网络VoVNetv2作为骨干网络,建立鱼群摄食图像实例分割网络SA_VoVNetv2_RCNN,提升模型对鱼群关键特征的提取能力以及对重点信息的关注能力,同时减少网络参数。 [结果和讨论] 该方法的平均分割精度达71.014%,相比于SOLOv2、BlendMask和CondInst分别提升18.258%、3.982%和12.068%。为进一步验证模型对鱼群摄食行为量化的有效性,对真实环境下的鱼群进行验证实验,结果表明,模型对摄食和非摄食状态的鱼群具有良好的分割效果,在一定程度上解决了因分割精度低导致的鱼群摄食行为量化错误的问题。 结论] 本研究提出的SA_VoVNetv2_RCNN网络能够实现鱼群摄食和非摄食图像的准确分割,为水下鱼群的摄食行为量化提供决策支撑。

关键词: 深度学习, 实例分割, Mask R-CNN, 注意力机制, VoVNetv2

Abstract:

[Objective] Intelligent feeding methods are significant for improving breeding efficiency and reducing water quality pollution in current aquaculture. Feeding image segmentation of fish schools is a critical step in extracting the distribution characteristics of fish schools and quantifying their feeding behavior for intelligent feeding method development. While, an applicable approach is lacking due to images challenges caused by blurred boundaries and similar individuals in practical aquaculture environment. In this study, a high-precision segmentation method was proposed for fish school feeding images and provides technical support for the quantitative analysis of fish school feeding behavior. [Methods] The novel proposed method for fish school feeding images segmentation combined VoVNetv2 with an attention mechanism named Shuffle Attention. Firstly, a fish feeding segmentation dataset was presented. The dataset was collected at the intensive aquaculture base of Laizhou Mingbo Company in Shandong province, with a focus on Oplegnathus punctatus as the research target. Cameras were used to capture videos of the fish school before, during, and after feeding. The images were annotated at the pixel level using Labelme software. According to the distribution characteristics of fish feeding and non-feeding stage, the data was classified into two semantic categories— non-occlusion and non-aggregation fish (fish1) and occlusion or aggregation fish (fish2). In the preprocessing stage, data cleaning and image augmentation were employed to further enhance the quality and diversity of the dataset. Initially, data cleaning rules were established based on the distribution of annotated areas within the dataset. Images with outlier annotations were removed, resulting in an improvement in the overall quality of the dataset. Subsequently, to prevent the risk of overfitting, five data augmentation techniques (random translation, random flip, brightness variation, random noise injection, random point addition) were applied for mixed augmentation on the dataset, contributing to an increased diversity of the dataset. Through data augmentation operations, the dataset was expanded to three times its original size. Eventually, the dataset was divided into a training dataset and testing dataset at a ratio of 8:2. Thus, the final dataset consisted of 1 612 training images and 404 testing images. In detail, there were a total of 116 328 instances of fish1 and 20 924 instances of fish2. Secondly, a fish feeding image segmentation method was proposed. Specifically, VoVNetv2 was used as the backbone network for the Mask R-CNN model to extract image features. VoVNetv2 is a backbone network with strong computational capabilities. Its unique feature aggregation structure enables effective fusion of features at different levels, extracting diverse feature representations. This facilitates better capturing of fish schools of different sizes and shapes in fish feeding images, achieving accurate identification and segmentation of targets within the images. To maximize feature mappings with limited resources, the experiment replaced the channel attention mechanism in the one-shot aggregation (OSA) module of VoVNetv2 with a more lightweight and efficient attention mechanism named shuffle attention. This improvement allowed the network to concentrate more on the location of fish in the image, thus reducing the impact of irrelevant information, such as noise, on the segmentation results. Finally, experiments were conducted on the fish segmentation dataset to test the performance of the proposed method. [Results and Discussions] The results showed that the average segmentation accuracy of the Mask R-CNN network reached 63.218% after data cleaning, representing an improvement of 7.018% compared to the original dataset. With both data cleaning and augmentation, the network achieved an average segmentation accuracy of 67.284%, indicating an enhancement of 11.084% over the original dataset. Furthermore, there was an improvement of 4.066% compared to the accuracy of the dataset after cleaning alone. These results demonstrated that data preprocessing had a positive effect on improving the accuracy of image segmentation. The ablation experiments on the backbone network revealed that replacing the ResNet50 backbone with VoVNetv2-39 in Mask R-CNN led to a 2.511% improvement in model accuracy. After improving VoVNetv2 through the Shuffle Attention mechanism, the accuracy of the model was further improved by 1.219%. Simultaneously, the parameters of the model decreased by 7.9%, achieving a balance between accuracy and lightweight design. Comparing with the classic segmentation networks SOLOv2, BlendMask and CondInst, the proposed model achieved the highest segmentation accuracy across various target scales. For the fish feeding segmentation dataset, the average segmentation accuracy of the proposed model surpassed BlendMask, CondInst, and SOLOv2 by 3.982%, 12.068%, and 18.258%, respectively. Although the proposed method demonstrated effective segmentation of fish feeding images, it still exhibited certain limitations, such as omissive detection, error segmentation, and false classification. [Conclusions] The proposed instance segmentation algorithm (SA_VoVNetv2_RCNN) effectively achieved accurate segmentation of fish feeding images. It can be utilized for counting the number and pixel quantities of two types of fish in fish feeding videos, facilitating quantitative analysis of fish feeding behavior. Therefore, this technique can provide technical support for the analysis of piscine feeding actions. In future research, these issues will be addressed to further enhance the accuracy of fish feeding image segmentation.

Key words: deep learning, instance segmentation, Mask R-CNN, attention mechanism, VoVNetv2