Weakly-Supervised Online Action Segmentation in Multi-View Instructional Videos
Reza Ghoddoosian, Isht Dwivedi, Nakul Agarwal, Chiho Choi, Behzad, Dariush

TL;DR
This paper introduces a weakly-supervised online action segmentation framework for multi-view instructional videos, utilizing Dynamic Programming, a novel loss function, and multi-view inference to improve segmentation accuracy and temporal consistency.
Contribution
It proposes a new weakly-supervised online segmentation method with multi-view pseudo-labeling and a discrepancy loss, advancing real-time action segmentation in instructional videos.
Findings
Outperforms greedy sliding window methods
Achieves higher temporal consistency in segmentation
Effective in cooking and assembly domains
Abstract
This paper addresses a new problem of weakly-supervised online action segmentation in instructional videos. We present a framework to segment streaming videos online at test time using Dynamic Programming and show its advantages over greedy sliding window approach. We improve our framework by introducing the Online-Offline Discrepancy Loss (OODL) to encourage the segmentation results to have a higher temporal consistency. Furthermore, only during training, we exploit frame-wise correspondence between multiple views as supervision for training weakly-labeled instructional videos. In particular, we investigate three different multi-view inference techniques to generate more accurate frame-wise pseudo ground-truth with no additional annotation cost. We present results and ablation studies on two benchmark multi-view datasets, Breakfast and IKEA ASM. Experimental results show efficacy of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Advanced Vision and Imaging · Multimodal Machine Learning Applications
