Return of Frustratingly Easy Unsupervised Video Domain Adaptation
Pengfei Wei, Yiqun Sun, Zhiqiang Xu, Yiping Ke, Lawrence B. Hsieh

TL;DR
MetaTrans is a simple yet effective unsupervised video domain adaptation method that separately handles spatial and temporal divergences, leading to significant performance improvements in cross-domain action recognition.
Contribution
It introduces a straightforward learning objective and a novel temporal-static subtraction module to effectively address spatial and temporal divergences in UVDA.
Findings
Substantial performance improvement over state-of-the-art UVDA methods.
Effective removal of spatial and temporal divergence through the proposed module.
Achieved significant gains in cross-domain action recognition tasks.
Abstract
Unsupervised video domain adaptation (UVDA) is a practical but under-explored problem. In this paper, we propose a frustratingly easy UVDA method, called MetaTrans. Specifically, MetaTrans adopts a concise learning objective that contains only two fundamental loss terms. Despite the simplicity of the learning objective, MetaTrans embodies an advanced UVDA idea, that is, handling the spatial and temporal divergence of cross-domain videos separately, through a subtle model architecture design. By implementing a temporal-static subtraction module, MetaTrans effectively removes spatial and temporal divergence. Extensive empirical evaluations, particularly on various cross-domain action recognition tasks, show substantial absolute adaptation performance enhancement and significantly superior relative performance gain compared with state-of-the-art UVDA baselines.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
