欢迎您访问《智慧农业(中英文)》官方网站! English

Smart Agriculture

• •    

基于改进AdaTAD的奶山羊爬跨行为时序动作定位方法

王佳源1, 李其同1, 罗元滔1, 杨蜀秦2, 王振华1, 宁纪锋1(), 王美丽1   

  1. 1. 西北农林科技大学 信息工程学院,陕西杨凌 712100,中国
    2. 西北农林科技大学 机械与电子工程学院,陕西杨凌 712100,中国
  • 收稿日期:2026-01-09 出版日期:2026-04-22
  • 基金项目:
    国家重点研发计划(2022YFD1300200); 陕西秦创原引用高层次创新创业人才项目(QCYRCXM-2022-359)
  • 作者简介:

    王佳源,硕士研究生,研究方向为基于计算机视觉的奶山羊爬跨行为时序动作定位方法研究。E-mail:

  • 通信作者:
    宁纪锋,博士,教授,研究方向为计算机视觉及机器学习。E-mail:

Temporal Action Localization of Mounting Behavior in Dairy Goats Based on an Improved AdaTAD

WANG Jiayuan1, LI Qitong1, LUO Yuantao1, YANG Shuqin2, WANG Zhenhua1, NING Jifeng1(), WANG Meili1   

  1. 1. College of Information Engineering, Northwest A & F University, Yangling 712100, China
    2. College of Mechanical and Electronic Engineering, Northwest A & F University, Yangling 712100, China
  • Received:2026-01-09 Online:2026-04-22
  • Foundation items:National Key Research and Development Program of China(2022YFD1300200); Shaanxi Qinchuangyuan High-level Innovation and Entrepreneurship Talent Program(QCYRCXM-2022-359)
  • About author:

    WANG Jiayuan, E-mail:

  • Corresponding author:
    NING Jifeng, E-mail:

摘要:

【目的/意义】 奶山羊爬跨行为的时序定位是繁殖管理的重要基础。针对现有方法多停留在行为判别层面、在未修剪视频中对短时突发行为的起止边界刻画不足,且易受遮挡、视角变化与背景干扰影响的问题,提出一种基于面向时序动作定位的适配器调优(Adapter Tuning for Temporal Action Detection, AdaTAD)改进的端到端时序动作定位方法,以实现爬跨行为的准确识别与起止时间精确定位。 【方法】 以AdaTAD框架为基线,引入视觉提示调优,通过少量可学习Prompt Tokens对主干注意力分布进行任务引导,增强关键帧及边界邻域的特征响应;设计多尺度运动感知适配器,采用并联多尺度时序深度可分离卷积分支建模不同时间尺度的运动模式,并结合残差连接与非线性映射稳定注入主干特征,提升短时微动作与相对完整动作过程的联合建模能力。 【结果与讨论】 所提方法的平均精度均值达到81.72%,相较基准模型AdaTAD提升5.00个百分点;在时间交并比为0.7的更严格条件下达到68.85%,较AdaTAD提升4.06个百分点,表明该方法在高边界精度要求下仍具有优势。模型推理速度为每秒65.78帧,可训练参数量为27.941 M,在精度提升的同时保持较低开销。 【结论】 该方法可提升复杂养殖场景下奶山羊爬跨行为的时序定位精度与稳定性,为繁殖行为监测与管理决策提供关键时序信息支撑。

关键词: 爬跨行为, 时序动作定位, 奶山羊, 精准养殖, 多尺度运动建模, 端到端模型

Abstract:

[Objective] Accurate temporal localization of mounting behaviour in dairy goats is important for intelligent reproductive management, as event frequency, onset time, and duration provide useful evidence for heat monitoring and mating decisions. Unlike simple behaviour recognition, temporal localization in untrimmed videos enables fine-grained, time-resolved records for practical farm use. However, real-world mounting behaviour is usually brief and sporadic, with few informative frames in long video streams. Moreover, weak discrimination from similar non-target interactions, together with occlusion, viewpoint variation, and background motion, often degrades boundary-aware representation learning and leads to unstable start–end localization. To address these challenges, an improved AdaTAD-based end-to-end temporal action localization approach is proposed for mounting behaviour in dairy goats, aiming to enhance localization accuracy and stability while maintaining practical efficiency for deployment. [Methods] The proposed approach adopted AdaTAD as the baseline end-to-end temporal action localization framework and introduced two complementary improvements, explicit key-frame guidance and multi-scale motion modelling, while retaining the original detection head and post-processing pipeline for generating temporal action instances. First, visual prompt tuning (VPT) was incorporated to provide task-conditioned guidance to backbone feature extraction in a parameter-efficient manner. Specifically, a small number of learnable prompt tokens were inserted into the Transformer backbone with backbone parameters frozen. Through multi-head attention interactions between prompt tokens and patch tokens, the prompts steer attention towards mounting-relevant temporal regions, strengthened feature responses at critical frames and in boundary neighbourhoods, and improved the separability between brief target segments and abundant background frames. Second, a multi-scale motion adapter (MSMA) was introduced to model motion patterns at different temporal scales and improve robustness to diverse scene dynamics. MSMA emploied parallel multi-scale temporal depthwise separable convolution branches to capture short-, mid-, and longer-range temporal variations, enhancing representations of subtle short-duration micro-actions as well as relatively complete action processes. Residual connections and nonlinear mappings further stabilised feature injection and gradient propagation, enabling multi-scale dynamics to be integrated into backbone features with limited additional optimisation burden. Overall, VPT focused on boundary-relevant attention guidance, whereas MSMA emphasises multi-scale temporal dynamics modelling; Together, they formed a complementary design within the end-to-end localization pipeline. [Results and Discussions] Comparative experiments showed that the proposed method achieves an average mAP (mean Average Precision@[0.3:0.1:0.7]) of 81.72%, improving upon the baseline AdaTAD by 5.00 percentage points, indicating that incorporating VPT and MSMA enhanced overall localization performance. At a temporal Intersection over Union (tIoU) threshold of 0.7, the proposed method attained 68.85%, exceeding AdaTAD by 4.06 percentage points, demonstrating that the performance gain was preserved under stricter temporal boundary-consistency criteria. Further comparisons with representative approached, including TadTR, VSGN, AFSD, ActionFormer, TriDet, DyFADet, and Re2TAL, showed average mAP improvements of 38.82, 33.83, 25.29, 4.09, 2.83, 1.20, and 6.06 percentage points, respectively, demonstrating stronger overall competitiveness. In terms of efficiency, the model ran at 65.78 f/s with 27.941 million trainable parameters, indicating that the accuracy gains were achieved while maintaining a relatively low parameter overhead and practical runtime efficiency. Overall, task-guided prompting and multi-scale temporal modelling improved key temporal feature representations with limited parameter increments, thereby benefiting localization of short, sporadic behaviours. [Conclusions] This study presents an improved AdaTAD-based end-to-end temporal action localization method for mounting behaviour in dairy goats. By combiningVPT for boundary-relevant attention guidance with a MSMA for multi-scale temporal dynamics modelling, the proposed approach improves localization accuracy and maintains stable advantages under stricter boundary-consistency requirements, while preserving practical inference efficiency. The method provides critical temporal information for reproductive behaviour monitoring and decision support, and offers a feasible basis for building individual-level, time-resolved management systems in real farming environments.

Key words: mounting behavior, temporal action localization, dairy goats, precision livestock farming, multi-scale motion modeling, end-to-end model

中图分类号: