Human-Centric Transformer for Domain Adaptive Action Recognition
Kun-Yu Lin, Jiaming Zhou, Wei-Shi Zheng

TL;DR
This paper introduces HCTransformer, a human-centric transformer model that improves domain adaptive action recognition by explicitly focusing on human cues and human-context interactions, leading to state-of-the-art results.
Contribution
The paper proposes a novel human-centric learning paradigm with a decoupled transformer architecture to better exploit human cues for domain adaptation in action recognition.
Findings
Achieves state-of-the-art performance on three benchmark datasets.
Effectively preserves human cues during domain-invariant feature learning.
Demonstrates the importance of human-centric cues in cross-domain action recognition.
Abstract
We study the domain adaptation task for action recognition, namely domain adaptive action recognition, which aims to effectively transfer action recognition power from a label-sufficient source domain to a label-free target domain. Since actions are performed by humans, it is crucial to exploit human cues in videos when recognizing actions across domains. However, existing methods are prone to losing human cues but prefer to exploit the correlation between non-human contexts and associated actions for recognition, and the contexts of interest agnostic to actions would reduce recognition performance in the target domain. To overcome this problem, we focus on uncovering human-centric action cues for domain adaptive action recognition, and our conception is to investigate two aspects of human-centric action cues, namely human cues and human-context interaction cues. Accordingly, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis
MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Focus · Linear Layer · Label Smoothing · Adam · Dropout · Multi-Head Attention · Dense Connections
