Activity Grammars for Temporal Action Segmentation
Dayoung Gong, Joonseok Lee, Deunsol Jung, Suha Kwak, Minsu Cho

TL;DR
This paper introduces an activity grammar and a grammar induction algorithm to improve temporal action segmentation, enhancing both accuracy and interpretability of sequence predictions in untrimmed videos.
Contribution
It presents a novel grammar induction method and a generalized parser that integrate with neural networks to better capture compositional structures in action sequences.
Findings
Significant performance improvements on Breakfast and 50 Salads benchmarks.
Enhanced interpretability of action segmentation results.
Effective integration with existing neural network models.
Abstract
Sequence prediction on temporal data requires the ability to understand compositional structures of multi-level semantics beyond individual and contextual properties. The task of temporal action segmentation, which aims at translating an untrimmed activity video into a sequence of action segments, remains challenging for this reason. This paper addresses the problem by introducing an effective activity grammar to guide neural predictions for temporal action segmentation. We propose a novel grammar induction algorithm that extracts a powerful context-free grammar from action sequence data. We also develop an efficient generalized parser that transforms frame-level probability distributions into a reliable sequence of actions according to the induced grammar with recursive rules. Our approach can be combined with any neural network for temporal action segmentation to enhance the sequence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Analysis and Summarization
