H-SAUR: Hypothesize, Simulate, Act, Update, and Repeat for Understanding Object Articulations from Interactions
Kei Ota, Hsiao-Yu Tung, Kevin A. Smith, Anoop Cherian, Tim K. Marks,, Alan Sullivan, Asako Kanezaki, and Joshua B. Tenenbaum

TL;DR
H-SAUR is a probabilistic framework enabling autonomous agents to understand and manipulate articulated objects through hypothesis generation, simulation, and iterative updating, significantly improving efficiency and accuracy without requiring training data.
Contribution
The paper introduces H-SAUR, a novel probabilistic framework that models hypotheses about object articulations and guides exploration, outperforming existing methods without training data.
Findings
H-SAUR outperforms state-of-the-art methods on PartNet-Mobility.
H-SAUR effectively solves multi-step puzzles in the PuzzleBoxes benchmark.
Incorporating learned priors improves test-time efficiency.
Abstract
The world is filled with articulated objects that are difficult to determine how to use from vision alone, e.g., a door might open inwards or outwards. Humans handle these objects with strategic trial-and-error: first pushing a door then pulling if that doesn't work. We enable these capabilities in autonomous agents by proposing "Hypothesize, Simulate, Act, Update, and Repeat" (H-SAUR), a probabilistic generative framework that simultaneously generates a distribution of hypotheses about how objects articulate given input observations, captures certainty over hypotheses over time, and infer plausible actions for exploration and goal-conditioned manipulation. We compare our model with existing work in manipulating objects after a handful of exploration actions, on the PartNet-Mobility dataset. We further propose a novel PuzzleBoxes benchmark that contains locked boxes that require…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Natural Language Processing Techniques
