Distill and Collect for Semi-Supervised Temporal Action Segmentation

Sovan Biswas; Anthony Rhodes; Ramesh Manuvinakurike; Giuseppe Raffa,; Richard Beckwith

arXiv:2211.01311·cs.CV·November 4, 2022

Distill and Collect for Semi-Supervised Temporal Action Segmentation

Sovan Biswas, Anthony Rhodes, Ramesh Manuvinakurike, Giuseppe Raffa,, Richard Beckwith

PDF

Open Access

TL;DR

This paper introduces a semi-supervised method for temporal action segmentation that leverages both annotated and unannotated videos, using multi-stream distillation and action order prediction to improve performance with limited labels.

Contribution

It proposes a novel semi-supervised approach combining multi-stream distillation and action order prediction for temporal action segmentation.

Findings

01

Achieves comparable performance to fully supervised methods with limited annotations.

02

Effectively leverages unannotated videos to improve segmentation accuracy.

03

Demonstrates robustness across multiple datasets.

Abstract

Recent temporal action segmentation approaches need frame annotations during training to be effective. These annotations are very expensive and time-consuming to obtain. This limits their performances when only limited annotated data is available. In contrast, we can easily collect a large corpus of in-domain unannotated videos by scavenging through the internet. Thus, this paper proposes an approach for the temporal action segmentation task that can simultaneously leverage knowledge from annotated and unannotated video sequences. Our approach uses multi-stream distillation that repeatedly refines and finally combines their frame predictions. Our model also predicts the action order, which is later used as a temporal constraint while estimating frames labels to counter the lack of supervision for unannotated videos. In the end, our evaluation of the proposed approach on two different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Advanced Vision and Imaging