Semantic Decomposition and Recognition of Long and Complex Manipulation Action Sequences
Eren Erdal Aksoy, Adil Orhan, Florentin Woergoetter

TL;DR
This paper presents a novel semantic segmentation method for long, complex human manipulation actions using Semantic Event Chains, which effectively captures spatiotemporal structures invariant to motion and scene variations.
Contribution
Introduces a new SEC-based framework for automatic parsing and recognition of complex manipulation sequences without prior object knowledge.
Findings
Effective segmentation of complex actions
Robust recognition across diverse datasets
Invariant to motion and scene variations
Abstract
Understanding continuous human actions is a non-trivial but important problem in computer vision. Although there exists a large corpus of work in the recognition of action sequences, most approaches suffer from problems relating to vast variations in motions, action combinations, and scene contexts. In this paper, we introduce a novel method for semantic segmentation and recognition of long and complex manipulation action tasks, such as "preparing a breakfast" or "making a sandwich". We represent manipulations with our recently introduced "Semantic Event Chain" (SEC) concept, which captures the underlying spatiotemporal structure of an action invariant to motion, velocity, and scene context. Solely based on the spatiotemporal interactions between manipulated objects and hands in the extracted SEC, the framework automatically parses individual manipulation streams performed either…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
