Welcome to Smart Agriculture

Smart Agriculture ›› 2023, Vol. 5 ›› Issue (4): 137-149.doi: 10.12133/j.smartag.SA202310003

• Special Issue--Artificial Intelligence and Robot Technology for Smart Agriculture • Previous Articles     Next Articles

Image Segmentation Method Combined with VoVNetv2 and Shuffle Attention Mechanism for Fish Feeding in Aquaculture

WANG Herong1,3,4,5(), CHEN Yingyi1,3,4,5, CHAI Yingqian1,3,4,5, XU Ling1,3,4,5, YU Huihui2,6()   

  1. 1. National Innovation Center for Digital Fishery, China Agricultural University, Beijing 100083, China
    2. School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China
    3. Key Laboratory of Smart Farming Technologies for Aquatic Animal and Livestock, Ministry of Agriculture and Rural Affairs, Beijing 100083, China
    4. Beijing Engineering and Technology Research Centre for Internet of Things in Agriculture, Beijing 100083, China
    5. College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China
    6. Engineering Research Center for Forestry-oriented Intelligent Information Processing, National Forestry and Grassland Administration, Beijing 100083, China
  • Received:2023-10-07 Online:2023-12-30
  • corresponding author:
    YU Huihui, E-mail:
  • Supported by:
    National Natural Science Foundation of China(62206021); Beijing Digital Agriculture Innovation Consortium Project(BAIC10-2023)

Abstract:

[Objective] Intelligent feeding methods are significant for improving breeding efficiency and reducing water quality pollution in current aquaculture. Feeding image segmentation of fish schools is a critical step in extracting the distribution characteristics of fish schools and quantifying their feeding behavior for intelligent feeding method development. While, an applicable approach is lacking due to images challenges caused by blurred boundaries and similar individuals in practical aquaculture environment. In this study, a high-precision segmentation method was proposed for fish school feeding images and provides technical support for the quantitative analysis of fish school feeding behavior. [Methods] The novel proposed method for fish school feeding images segmentation combined VoVNetv2 with an attention mechanism named Shuffle Attention. Firstly, a fish feeding segmentation dataset was presented. The dataset was collected at the intensive aquaculture base of Laizhou Mingbo Company in Shandong province, with a focus on Oplegnathus punctatus as the research target. Cameras were used to capture videos of the fish school before, during, and after feeding. The images were annotated at the pixel level using Labelme software. According to the distribution characteristics of fish feeding and non-feeding stage, the data was classified into two semantic categories— non-occlusion and non-aggregation fish (fish1) and occlusion or aggregation fish (fish2). In the preprocessing stage, data cleaning and image augmentation were employed to further enhance the quality and diversity of the dataset. Initially, data cleaning rules were established based on the distribution of annotated areas within the dataset. Images with outlier annotations were removed, resulting in an improvement in the overall quality of the dataset. Subsequently, to prevent the risk of overfitting, five data augmentation techniques (random translation, random flip, brightness variation, random noise injection, random point addition) were applied for mixed augmentation on the dataset, contributing to an increased diversity of the dataset. Through data augmentation operations, the dataset was expanded to three times its original size. Eventually, the dataset was divided into a training dataset and testing dataset at a ratio of 8:2. Thus, the final dataset consisted of 1 612 training images and 404 testing images. In detail, there were a total of 116 328 instances of fish1 and 20 924 instances of fish2. Secondly, a fish feeding image segmentation method was proposed. Specifically, VoVNetv2 was used as the backbone network for the Mask R-CNN model to extract image features. VoVNetv2 is a backbone network with strong computational capabilities. Its unique feature aggregation structure enables effective fusion of features at different levels, extracting diverse feature representations. This facilitates better capturing of fish schools of different sizes and shapes in fish feeding images, achieving accurate identification and segmentation of targets within the images. To maximize feature mappings with limited resources, the experiment replaced the channel attention mechanism in the one-shot aggregation (OSA) module of VoVNetv2 with a more lightweight and efficient attention mechanism named shuffle attention. This improvement allowed the network to concentrate more on the location of fish in the image, thus reducing the impact of irrelevant information, such as noise, on the segmentation results. Finally, experiments were conducted on the fish segmentation dataset to test the performance of the proposed method. [Results and Discussions] The results showed that the average segmentation accuracy of the Mask R-CNN network reached 63.218% after data cleaning, representing an improvement of 7.018% compared to the original dataset. With both data cleaning and augmentation, the network achieved an average segmentation accuracy of 67.284%, indicating an enhancement of 11.084% over the original dataset. Furthermore, there was an improvement of 4.066% compared to the accuracy of the dataset after cleaning alone. These results demonstrated that data preprocessing had a positive effect on improving the accuracy of image segmentation. The ablation experiments on the backbone network revealed that replacing the ResNet50 backbone with VoVNetv2-39 in Mask R-CNN led to a 2.511% improvement in model accuracy. After improving VoVNetv2 through the Shuffle Attention mechanism, the accuracy of the model was further improved by 1.219%. Simultaneously, the parameters of the model decreased by 7.9%, achieving a balance between accuracy and lightweight design. Comparing with the classic segmentation networks SOLOv2, BlendMask and CondInst, the proposed model achieved the highest segmentation accuracy across various target scales. For the fish feeding segmentation dataset, the average segmentation accuracy of the proposed model surpassed BlendMask, CondInst, and SOLOv2 by 3.982%, 12.068%, and 18.258%, respectively. Although the proposed method demonstrated effective segmentation of fish feeding images, it still exhibited certain limitations, such as omissive detection, error segmentation, and false classification. [Conclusions] The proposed instance segmentation algorithm (SA_VoVNetv2_RCNN) effectively achieved accurate segmentation of fish feeding images. It can be utilized for counting the number and pixel quantities of two types of fish in fish feeding videos, facilitating quantitative analysis of fish feeding behavior. Therefore, this technique can provide technical support for the analysis of piscine feeding actions. In future research, these issues will be addressed to further enhance the accuracy of fish feeding image segmentation.

Key words: deep learning, instance segmentation, Mask R-CNN, attention mechanism, VoVNetv2