Adversarially Masked Video Consistency for Unsupervised Domain   Adaptation

Xiaoyu Zhu; Junwei Liang; Po-Yao Huang; Alex Hauptmann

arXiv:2403.16242·cs.CV·March 26, 2024·1 cites

Adversarially Masked Video Consistency for Unsupervised Domain Adaptation

Xiaoyu Zhu, Junwei Liang, Po-Yao Huang, Alex Hauptmann

PDF

Open Access

TL;DR

This paper introduces a transformer-based approach for unsupervised domain adaptation in egocentric videos, combining adversarial domain alignment with masked consistency learning to improve class-discriminative and domain-invariant features, evaluated on a new challenging benchmark.

Contribution

It proposes a novel adversarial domain alignment network with masking strategies and a masked consistency learning module for egocentric video adaptation, along with a new benchmark dataset.

Findings

01

Achieves state-of-the-art results on Epic-Kitchen.

02

Develops a new challenging egocentric video benchmark U-Ego4D.

03

Demonstrates effectiveness of combined adversarial and consistency learning.

Abstract

We study the problem of unsupervised domain adaptation for egocentric videos. We propose a transformer-based model to learn class-discriminative and domain-invariant feature representations. It consists of two novel designs. The first module is called Generative Adversarial Domain Alignment Network with the aim of learning domain-invariant representations. It simultaneously learns a mask generator and a domain-invariant encoder in an adversarial way. The domain-invariant encoder is trained to minimize the distance between the source and target domain. The masking generator, conversely, aims at producing challenging masks by maximizing the domain distance. The second is a Masked Consistency Learning module to learn class-discriminative representations. It enforces the prediction consistency between the masked target videos and their full forms. To better evaluate the effectiveness of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Human Pose and Action Recognition · Multimodal Machine Learning Applications