Action Segmentation with Mixed Temporal Domain Adaptation

Min-Hung Chen; Baopu Li; Yingze Bao; Ghassan AlRegib

arXiv:2104.07461·cs.CV·April 19, 2021

Action Segmentation with Mixed Temporal Domain Adaptation

Min-Hung Chen, Baopu Li, Yingze Bao, Ghassan AlRegib

PDF

Open Access

TL;DR

This paper introduces Mixed Temporal Domain Adaptation (MTDA), a novel method for action segmentation that aligns features across domains at both frame and video levels, improving performance on challenging datasets.

Contribution

The paper proposes MTDA, the first approach to jointly adapt frame- and video-level features in action segmentation, incorporating a domain attention mechanism for better alignment.

Findings

01

MTDA outperforms state-of-the-art methods on GTEA, 50Salads, and Breakfast datasets.

02

Achieves 6.4% gain on F1@50 and 6.8% gain on edit score for GTEA.

03

Effective domain adaptation with large margins across multiple datasets.

Abstract

The main progress for action segmentation comes from densely-annotated data for fully-supervised learning. Since manual annotation for frame-level actions is time-consuming and challenging, we propose to exploit auxiliary unlabeled videos, which are much easier to obtain, by shaping this problem as a domain adaptation (DA) problem. Although various DA techniques have been proposed in recent years, most of them have been developed only for the spatial direction. Therefore, we propose Mixed Temporal Domain Adaptation (MTDA) to jointly align frame- and video-level embedded feature spaces across domains, and further integrate with the domain attention mechanism to focus on aligning the frame-level features with higher domain discrepancy, leading to more effective domain adaptation. Finally, we evaluate our proposed methods on three challenging datasets (GTEA, 50Salads, and Breakfast), and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning