H-SAUR: Hypothesize, Simulate, Act, Update, and Repeat for Understanding   Object Articulations from Interactions

Kei Ota; Hsiao-Yu Tung; Kevin A. Smith; Anoop Cherian; Tim K. Marks,; Alan Sullivan; Asako Kanezaki; and Joshua B. Tenenbaum

arXiv:2210.12521·cs.RO·October 25, 2022

H-SAUR: Hypothesize, Simulate, Act, Update, and Repeat for Understanding Object Articulations from Interactions

Kei Ota, Hsiao-Yu Tung, Kevin A. Smith, Anoop Cherian, Tim K. Marks,, Alan Sullivan, Asako Kanezaki, and Joshua B. Tenenbaum

PDF

Open Access

TL;DR

H-SAUR is a probabilistic framework enabling autonomous agents to understand and manipulate articulated objects through hypothesis generation, simulation, and iterative updating, significantly improving efficiency and accuracy without requiring training data.

Contribution

The paper introduces H-SAUR, a novel probabilistic framework that models hypotheses about object articulations and guides exploration, outperforming existing methods without training data.

Findings

01

H-SAUR outperforms state-of-the-art methods on PartNet-Mobility.

02

H-SAUR effectively solves multi-step puzzles in the PuzzleBoxes benchmark.

03

Incorporating learned priors improves test-time efficiency.

Abstract

The world is filled with articulated objects that are difficult to determine how to use from vision alone, e.g., a door might open inwards or outwards. Humans handle these objects with strategic trial-and-error: first pushing a door then pulling if that doesn't work. We enable these capabilities in autonomous agents by proposing "Hypothesize, Simulate, Act, Update, and Repeat" (H-SAUR), a probabilistic generative framework that simultaneously generates a distribution of hypotheses about how objects articulate given input observations, captures certainty over hypotheses over time, and infer plausible actions for exploration and goal-conditioned manipulation. We compare our model with existing work in manipulating objects after a handful of exploration actions, on the PartNet-Mobility dataset. We further propose a novel PuzzleBoxes benchmark that contains locked boxes that require…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Natural Language Processing Techniques