TL;DR
This paper introduces an orientation-aware zero-shot human action recognition method that combines multi-view motion cues and textual descriptions to improve cross-domain generalization and outperform existing approaches.
Contribution
It proposes a novel orientation-aware motion encoding network and text prompts, enhancing zero-shot recognition across diverse datasets and unseen action categories.
Findings
Outperforms recent state-of-the-art zero-shot approaches on multiple benchmarks.
Improves cross-domain recognition accuracy significantly.
Learned representations show strong transfer learning capabilities.
Abstract
Robustness to domain changes is a key capability for effective deployment of human action recognition systems in real-world scenarios, where action categories at inference can present important domain shifts or even unseen actions from training. In this context, improving the recognition capabilities of Zero-Shot Action Recognition models (ZSAR), without requiring strong annotation efforts, remains a central challenge. Most ZSAR approaches assume that actions are observed under geometric conditions similar to those seen during training. In practice, variations in human body orientation and camera viewpoint add a significant domain gap in ZSAR, substantially limiting generalization to novel action-motion combinations. In this context, this paper presents a novel orientation-aware action recognition approach with improved cross-domain capabilities. Our approach combines motion cues of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
