欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture ›› 2024, Vol. 6 ›› Issue (4): 18-28.doi: 10.12133/j.smartag.SA202405025

• 专题--智慧畜牧技术创新与可持续发展 • 上一篇    下一篇

基于时空流特征融合的俯视视角下奶牛跛行自动检测方法

代昕1, 王军号1, 张翼1,2, 王鑫杰1, 李晏兴1, 戴百生1(), 沈维政1()   

  1. 1. 东北农业大学 电气与信息学院,黑龙江 哈尔滨 150030,中国
    2. 黑龙江东方学院 信息工程学院,黑龙江 哈尔滨 150086,中国
  • 收稿日期:2024-05-31 出版日期:2024-07-30
  • 基金项目:
    国家自然科学基金项目(32072788); 黑龙江省重点研发计划(2022ZX01A24); 国家重点研发计划(2023YFD2000700); 黑龙江东方学院科研平台支撑项目(PTZCXM2404)
  • 作者简介:
    代 昕,研究方向为智慧畜牧和智能视觉感知。E-mail:
  • 通信作者:
    戴百生,博士,副教授,研究方向为智慧畜牧和智能视觉感知。E-mail:
    沈维政,博士,教授,研究方向为智慧畜牧和数字农业。E-mail:

Automatic Detection Method of Dairy Cow Lameness from Top-view Based on the Fusion of Spatiotemporal Stream Features

DAI Xin1, WANG Junhao1, ZHANG Yi1,2, WANG Xinjie1, LI Yanxing1, DAI Baisheng1(), SHEN Weizheng1()   

  1. 1. College of Electrical Engineering and Information, Northeast Agricultural University, Harbin 150030, China
    2. College of Information Engineering, East University of Heilongjiang, Harbin 150086, China
  • Received:2024-05-31 Online:2024-07-30
  • Foundation items:National Natural Science Foundation of China(32072788); Key Research and Development Program of Heilongjiang Province(2022ZX01A24); National Key Research and Development Program of China(2023YFD2000700); Project Supported by Scientific Research Platform of East University of Heilongjiang(PTZCXM2404)
  • About author:
    DAI Xin, E-mail:
  • Corresponding author:
    DAI Baisheng, E-mail:;
    SHEN Weizheng, E-mail:

摘要:

[目的/意义] 奶牛跛行检测是规模化奶牛养殖过程中亟待解决的重要问题,现有方法的检测视角主要以侧视为主。然而,侧视视角存在着难以消除的遮挡问题。本研究主要解决侧视视角下存在的遮挡问题。 [方法] 提出一种基于时空流特征融合的俯视视角下奶牛跛行检测方法。首先,通过分析深度视频流中跛行奶牛在运动过程中的位姿变化,构建空间流特征图像序列。通过分析跛行奶牛行走时躯体前进和左右摇摆的瞬时速度,利用光流捕获奶牛运动的瞬时速度,构建时间流特征图像序列。将空间流与时间流特征图像组合构建时空流融合特征图像序列。其次,利用卷积块注意力模块(Convolutional Block Attention Module, CBAM)改进PP-TSMv2(PaddlePaddle-Temporal Shift Module v2)视频动作分类网络,构建奶牛跛行检测模型Cow-TSM(Cow-Temporal Shift Module)。最后,分别在不同输入模态、不同注意力机制、不同视频动作分类网络和现有方法4个方面对比,进行奶牛跛行实验,以探究所提出方法的优劣性。 [结果和讨论] 共采集处理了180段奶牛图像序列数据,跛行奶牛与非跛行奶牛视频段数比例为1∶1,所提出模型识别精度达到88.7%,模型大小为22 M,离线推理时间为0.046 s。与主流视频动作分类模型TSM、PP-TSM、PP-TSMv2、SlowFast和TimesFormer模型相比,综合表现最好。同时,以时空流融合特征图像作为输入时,识别精度分别比单时间模态与单空间模态分别提升12%与4.1%,证明本研究中模态融合的有效性。通过与通道注意力(Squeeze-and-Excitation, SE)、卷积核注意力(Selective Kernel, SK)、坐标注意力(Coordinate Attention, CA)与CBAM不同注意力机制进行消融实验,证明利用CBAM注意力机制构建奶牛跛行检测模型效果最佳。最后,与现有跛行检测方法进行对比,所提出的方法同时具有较好的性能和实用性。 [结论] 本研究能够避免侧视视角下检测跛行奶牛时出现的遮挡问题,对于减少奶牛跛行发生率、提高牧场经济效益具有重要意义,符合牧场规模化建设的需求。

关键词: 奶牛跛行检测, 时空融合, 视频动作分类, 深度图像, 注意力机制, TSM

Abstract:

[Objective] The detection of lameness in dairy cows is an important issue that needs to be solved urgently in the process of large-scale dairy farming. Timely detection and effective intervention can reduce the culling rate of young dairy cows, which has important practical significance for increasing the milk production of dairy cows and improving the economic benefits of pastures. Due to the low efficiency and low degree of automation of traditional manual detection and contact sensor detection, the mainstream cow lameness detection method is mainly based on computer vision. The detection perspective of existing computer vision-based cow lameness detection methods is mainly side view, but the side view perspective has limitations that are difficult to eliminate. In the actual detection process, there are problems such as cows blocking each other and difficulty in deployment. The cow lameness detection method from the top view will not be difficult to use on the farm due to occlusion problems. The aim is to solve the occlusion problem under the side view. [Methods] In order to fully explore the movement undulations of the trunk of the cow and the movement information in the time dimension during the walking process of the cow, a cow lameness detection method was proposed from a top view based on fused spatiotemporal flow features. By analyzing the height changes of the lame cow in the depth video stream during movement, a spatial stream feature image sequence was constructed. By analyzing the instantaneous speed of the lame cow's body moving forward and swaying left and right when walking, optical flow was used to capture the instantaneous speed of the cow's movement, and a time flow characteristic image sequence was constructed. The spatial flow and time flow features were combined to construct a fused spatiotemporal flow feature image sequence. Different from traditional image classification tasks, the image sequence of cows walking includes features in both time and space dimensions. There would be a certain distinction between lame cows and non-lame cows due to their related postures and walking speeds when walking, so using video information analysis was feasible to characterize lameness as a behavior. The video action classification network could effectively model the spatiotemporal information in the input image sequence and output the corresponding category in the predicted result. The attention module Convolutional Block Attention Module (CBAM) was used to improve the PP-TSMv2 video action classification network and build the Cow-TSM cow lameness detection model. The CBAM module could perform channel weighting on different modes of cows, while paying attention to the weights between pixels to improve the model's feature extraction capabilities. Finally, cow lameness experiments were conducted on different modalities, different attention mechanisms, different video action classification networks and comparison of existing methods. The data was used for cow lameness included a total of 180 video streams of cows walking. Each video was decomposed into 100‒400 frames. The ratio of the number of video segments of lame cows and normal cows was 1:1. For the feature extraction of cow lameness from the top view, RGB images had less extractable information, so this work mainly used depth video streams. [Results and Discussions] In this study, a total of 180 segments of cow image sequence data were acquired and processed, including 90 lame cows and 90 non-lame cows with a 1:1 ratio of video segments, and the prediction accuracy of automatic detection method for dairy cow lameness based on fusion of spatiotemporal stream features reaches 88.7%, the model size was 22 M, and the offline inference time was 0.046 s. The prediction accuracy of the common mainstream video action classification models TSM, PP-TSM, SlowFast and TimesFormer models on the data set of automatic detection method for dairy cow lameness based on fusion of spatiotemporal stream features reached 66.7%, 84.8%, 87.1% and 85.7%, respectively. The comprehensive performance of the improved Cow-TSM model in this paper was the most. At the same time, the recognition accuracy of the fused spatiotemporal flow feature image was improved by 12% and 4.1%, respectively, compared with the temporal mode and spatial mode, which proved the effectiveness of spatiotemporal flow fusion in this method. By conducting ablation experiments on different attention mechanisms of SE, SK, CA and CBAM, it was proved that the CBAM attention mechanism used has the best effect on the data of automatic detection method for dairy cow lameness based on fusion of spatiotemporal stream features. The channel attention in CBAM had a better effect on fused spatiotemporal flow data, and the spatial attention could also focus on the key spatial information in cow images. Finally, comparisons were made with existing lameness detection methods, including different methods from side view and top view. Compared with existing methods in the side-view perspective, the prediction accuracy of automatic detection method for dairy cow lameness based on fusion of spatiotemporal stream features was slightly lower, because the side-view perspective had more effective cow lameness characteristics. Compared with the method from the top view, a novel fused spatiotemporal flow feature detection method with better performance and practicability was proposed. [Conclusions] This method can avoid the occlusion problem of detecting lame cows from the side view, and at the same time improves the prediction accuracy of the detection method from the top view. It is of great significance for reducing the incidence of lameness in cows and improving the economic benefits of the pasture, and meets the needs of large-scale construction of the pasture.

Key words: dairy cow lameness detection, spatiotemporal fusion, video action classification, depth image, attention mechanism, TSM