Generalized Event Partonomy Inference with Structured Hierarchical Predictive Learning
Zhou Chen, Joe Lin, Sathyanarayanan N. Aakur\\

TL;DR
PARSE is a hierarchical, predictive learning framework that infers nested event structures from streaming video without supervision, achieving state-of-the-art results in temporal segmentation and event understanding.
Contribution
It introduces a unified, unsupervised hierarchical model that predicts multiscale event structures directly from streaming video, mirroring human perception.
Findings
Achieves state-of-the-art performance on multiple benchmarks.
Recovers nested event hierarchies aligned with human perception.
Performs comparably to offline methods in temporal segmentation.
Abstract
Humans naturally perceive continuous experience as a hierarchy of temporally nested events, fine-grained actions embedded within coarser routines. Replicating this structure in computer vision requires models that can segment video not just retrospectively, but predictively and hierarchically. We introduce PARSE, a unified framework that learns multiscale event structure directly from streaming video without supervision. PARSE organizes perception into a hierarchy of recurrent predictors, each operating at its own temporal granularity: lower layers model short-term dynamics while higher layers integrate longer-term context through attention-based feedback. Event boundaries emerge naturally as transient peaks in prediction error, yielding temporally coherent, nested partonomies that mirror the containment relations observed in human event perception. Evaluated across three benchmarks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis · Action Observation and Synchronization
