Iterative Contrast-Classify For Semi-supervised Temporal Action Segmentation
Dipika Singhania, Rahul Rahaman, Angela Yao

TL;DR
This paper introduces a semi-supervised approach for temporal action segmentation in videos, leveraging unsupervised representation learning and an iterative contrast-classify scheme to reduce labeling costs while maintaining high accuracy.
Contribution
It presents the first semi-supervised method for temporal action segmentation that combines unsupervised feature learning with iterative contrastive and classification training.
Findings
ICC with 40% labeled data matches fully-supervised performance
ICC improves MoF by up to 5.6% on benchmark datasets
Unsupervised representation learning effectively captures temporal action features
Abstract
Temporal action segmentation classifies the action of each frame in (long) video sequences. Due to the high cost of frame-wise labeling, we propose the first semi-supervised method for temporal action segmentation. Our method hinges on unsupervised representation learning, which, for temporal action segmentation, poses unique challenges. Actions in untrimmed videos vary in length and have unknown labels and start/end times. Ordering of actions across videos may also vary. We propose a novel way to learn frame-wise representations from temporal convolutional networks (TCNs) by clustering input features with added time-proximity condition and multi-resolution similarity. By merging representation learning with conventional supervised learning, we develop an "Iterative-Contrast-Classify (ICC)" semi-supervised learning scheme. With more labelled data, ICC progressively improves in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Video Analysis and Summarization
