Abstract
<jats:p>Fine-grained spatio-temporal action detection in continuous, unconstrained field videos remains a formidable challenge due to severe background clutter, high inter-class similarity, and the scarcity of domain-specific benchmarks. To address these limitations, we first introduce a large-scale Wintering-Crane Benchmark, providing dense, individual-level bounding box annotations for six complex behaviors across diverse habitat scenes. Leveraging this data, we propose AviaTAD-LGH, a real-time multi-task framework that incorporates auxiliary motion supervision into a dual-pathway 3D backbone to enhance feature discriminability. A critical bottleneck in such multi-task settings is the negative transfer caused by conflicting optimization objectives. To resolve this, we present Lightweight Gradient Harmonization (LGH), a plug-and-play optimization strategy that dynamically modulates task weights based on the cosine similarity of gradient directions. This mechanism effectively aligns optimization trajectories without introducing inference latency. Extensive experiments demonstrate that AviaTAD-LGH achieves a state-of-the-art mAP of 68.60%, surpassing strong public baselines by 7.44% and improving upon the single-task baseline by 2.80%, with significant gains observed on ambiguous dynamic classes. The proposed pipeline enables efficient, scalable ecological monitoring suitable for edge deployment.</jats:p>
| Original language | English |
|---|---|
| Journal | Sensors |
| DOIs | |
| Publication status | Published - 27 Mar 2026 |
Fingerprint
Dive into the research topics of 'AviaTAD-LGH: A Multi-Task Spatio-Temporal Action Detector with Lightweight Gradient Harmonization for Real-Time Avian Behavior Monitoring'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver