Transformers in Action: Weakly Supervised Action Segmentation
John Ridley, Huseyin Coskun, David Joseph Tan, Nassir Navab, Federico, Tombari

TL;DR
This paper explores the application of transformers to weakly supervised video action segmentation, demonstrating improved accuracy and efficiency over RNN-based models through novel architecture and transcript embedding techniques.
Contribution
It introduces a transformer-based architecture tailored for weakly supervised action segmentation and a transcript embedding method for faster inference and better segmentation performance.
Findings
Transformers outperform RNNs in action alignment accuracy.
The proposed transcript embedding accelerates inference.
The approach improves segmentation results on benchmark datasets.
Abstract
The video action segmentation task is regularly explored under weaker forms of supervision, such as transcript supervision, where a list of actions is easier to obtain than dense frame-wise labels. In this formulation, the task presents various challenges for sequence modeling approaches due to the emphasis on action transition points, long sequence lengths, and frame contextualization, making the task well-posed for transformers. Given developments enabling transformers to scale linearly, we demonstrate through our architecture how they can be applied to improve action alignment accuracy over the equivalent RNN-based models with the attention mechanism focusing around salient action transition regions. Additionally, given the recent focus on inference-time transcript selection, we propose a supplemental transcript embedding approach to select transcripts more quickly at inference-time.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Stroke Rehabilitation and Recovery · Anomaly Detection Techniques and Applications
