Distill and Collect for Semi-Supervised Temporal Action Segmentation
Sovan Biswas, Anthony Rhodes, Ramesh Manuvinakurike, Giuseppe Raffa,, Richard Beckwith

TL;DR
This paper introduces a semi-supervised method for temporal action segmentation that leverages both annotated and unannotated videos, using multi-stream distillation and action order prediction to improve performance with limited labels.
Contribution
It proposes a novel semi-supervised approach combining multi-stream distillation and action order prediction for temporal action segmentation.
Findings
Achieves comparable performance to fully supervised methods with limited annotations.
Effectively leverages unannotated videos to improve segmentation accuracy.
Demonstrates robustness across multiple datasets.
Abstract
Recent temporal action segmentation approaches need frame annotations during training to be effective. These annotations are very expensive and time-consuming to obtain. This limits their performances when only limited annotated data is available. In contrast, we can easily collect a large corpus of in-domain unannotated videos by scavenging through the internet. Thus, this paper proposes an approach for the temporal action segmentation task that can simultaneously leverage knowledge from annotated and unannotated video sequences. Our approach uses multi-stream distillation that repeatedly refines and finally combines their frame predictions. Our model also predicts the action order, which is later used as a temporal constraint while estimating frames labels to counter the lack of supervision for unannotated videos. In the end, our evaluation of the proposed approach on two different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Advanced Vision and Imaging
