PokeNet: Learning Kinematic Models of Articulated Objects from Human Observations
Anmol Gupta, Weiwei Gu, Omkar Patil, Jun Ki Lee, Nakul Gopalan

TL;DR
PokeNet is an end-to-end framework that learns articulation models of unknown objects from a single human demonstration, accurately predicting joint parameters, manipulation order, and joint states without prior object knowledge.
Contribution
It introduces a novel method that estimates articulation models from minimal data, handling occlusions and manipulation sequences for diverse and unseen objects.
Findings
Improves joint axis and state estimation accuracy by over 27%.
Works effectively in both simulation and real-world environments.
Handles occlusions and manipulation order without prior object knowledge.
Abstract
Articulation modeling enables robots to learn joint parameters of articulated objects for effective manipulation which can then be used downstream for skill learning or planning. Existing approaches often rely on prior knowledge about the objects, such as the number or type of joints. Some of these approaches also fail to recover occluded joints that are only revealed during interaction. Others require large numbers of multi-view images for every object, which is impractical in real-world settings. Furthermore, prior works neglect the order of manipulations, which is essential for many multi-DoF objects where one joint must be operated before another, such as a dishwasher. We introduce PokeNet, an end-to-end framework that estimates articulation models from a single human demonstration without prior object knowledge. Given a sequence of point cloud observations of a human manipulating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Human Motion and Animation
