KINet: Unsupervised Forward Models for Robotic Pushing Manipulation
Alireza Rezazadeh, Changhyun Choi

TL;DR
KINet is an unsupervised, object-centric model that predicts future states in robotic pushing tasks by reasoning about object interactions through keypoints, enabling generalization to new scenarios.
Contribution
Introduces KINet, an end-to-end unsupervised framework for physical reasoning in object-centric space using keypoints, without requiring ground-truth object annotations.
Findings
Accurately predicts future states in robotic pushing scenarios.
Generalizes to different numbers of objects and backgrounds.
Learns plannable representations for manipulation tasks.
Abstract
Object-centric representation is an essential abstraction for forward prediction. Most existing forward models learn this representation through extensive supervision (e.g., object class and bounding box) although such ground-truth information is not readily accessible in reality. To address this, we introduce KINet (Keypoint Interaction Network) -- an end-to-end unsupervised framework to reason about object interactions based on a keypoint representation. Using visual observations, our model learns to associate objects with keypoint coordinates and discovers a graph representation of the system as a set of keypoint embeddings and their relations. It then learns an action-conditioned forward model using contrastive estimation to predict future keypoint states. By learning to perform physical reasoning in the keypoint space, our model automatically generalizes to scenarios with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics
