Permutation-Aware Action Segmentation via Unsupervised Frame-to-Segment Alignment
Quoc-Huy Tran, Ahmed Mehmood, Muhammad Ahmed, Muhammad Naufil, Anas Zafar, Andrey Konin, M. Zeeshan Zia

TL;DR
This paper introduces an unsupervised transformer-based framework for activity segmentation that combines frame-level and segment-level cues, utilizing temporal optimal transport and frame-to-segment alignment to improve segmentation accuracy.
Contribution
It proposes a novel permutation-aware, unsupervised approach that integrates frame-to-segment alignment with transformer models for activity segmentation.
Findings
Achieves comparable or better performance than previous methods on four public datasets.
Utilizes temporal optimal transport for unsupervised training.
Introduces pseudo labels for effective unsupervised learning.
Abstract
This paper presents an unsupervised transformer-based framework for temporal activity segmentation which leverages not only frame-level cues but also segment-level cues. This is in contrast with previous methods which often rely on frame-level information only. Our approach begins with a frame-level prediction module which estimates framewise action classes via a transformer encoder. The frame-level prediction module is trained in an unsupervised manner via temporal optimal transport. To exploit segment-level information, we utilize a segment-level prediction module and a frame-to-segment alignment module. The former includes a transformer decoder for estimating video transcripts, while the latter matches frame-level features with segment-level features, yielding permutation-aware segmentation results. Moreover, inspired by temporal optimal transport, we introduce simple-yet-effective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Analysis and Summarization
