TL;DR
This paper introduces a hierarchical, unsupervised approach to physical reasoning from raw images, modeling objects as parts within hierarchies to better understand complex interactions and dynamics in real-world videos.
Contribution
It presents a novel hierarchical model that learns object parts and their relations directly from visual data, enhancing physical reasoning capabilities.
Findings
Improves modeling of synthetic videos.
Effective on real-world videos.
Outperforms baseline methods.
Abstract
Common-sense physical reasoning in the real world requires learning about the interactions of objects and their dynamics. The notion of an abstract object, however, encompasses a wide variety of physical objects that differ greatly in terms of the complex behaviors they support. To address this, we propose a novel approach to physical reasoning that models objects as hierarchies of parts that may locally behave separately, but also act more globally as a single whole. Unlike prior approaches, our method learns in an unsupervised fashion directly from raw visual images to discover objects, parts, and their relations. It explicitly distinguishes multiple levels of abstraction and improves over a strong baseline at modeling synthetic and real-world videos.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
