Multi-Source Video Domain Adaptation with Temporal Attentive Moment Alignment
Yuecong Xu, Jianfei Yang, Haozhi Cao, Keyu Wu, Min Wu, Rui Zhao,, Zhenghua Chen

TL;DR
This paper introduces TAMAN, a novel network for multi-source video domain adaptation that dynamically aligns spatial-temporal features, improving transfer effectiveness and robustness across diverse benchmarks.
Contribution
Proposes TAMAN, a new method for MSVDA that aligns spatial-temporal features and constructs robust global temporal features, advancing the state-of-the-art in video domain adaptation.
Findings
TAMAN outperforms existing methods on multiple MSVDA benchmarks.
Effective alignment of spatial-temporal features improves transfer performance.
Constructing robust global temporal features enhances domain invariance.
Abstract
Multi-Source Domain Adaptation (MSDA) is a more practical domain adaptation scenario in real-world scenarios. It relaxes the assumption in conventional Unsupervised Domain Adaptation (UDA) that source data are sampled from a single domain and match a uniform data distribution. MSDA is more difficult due to the existence of different domain shifts between distinct domain pairs. When considering videos, the negative transfer would be provoked by spatial-temporal features and can be formulated into a more challenging Multi-Source Video Domain Adaptation (MSVDA) problem. In this paper, we address the MSVDA problem by proposing a novel Temporal Attentive Moment Alignment Network (TAMAN) which aims for effective feature transfer by dynamically aligning both spatial and temporal feature moments. TAMAN further constructs robust global temporal features by attending to dominant domain-invariant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Human Pose and Action Recognition · Advanced Vision and Imaging
