kPAM: KeyPoint Affordances for Category-Level Robotic Manipulation
Lucas Manuelli, Wei Gao, Peter Florence, Russ Tedrake

TL;DR
kPAM introduces a category-level robotic manipulation approach using semantic 3D keypoints, enabling flexible, interpretable, and robust object placement despite large shape variations within object categories.
Contribution
It proposes a novel keypoint-based object representation for category-level manipulation, replacing traditional pose estimation, and integrates it into a perception-to-action pipeline.
Findings
Successfully manipulates unseen objects within categories.
Robustly handles large intra-category shape variations.
Achieves reliable placement of shoes and mugs in hardware experiments.
Abstract
We would like robots to achieve purposeful manipulation by placing any instance from a category of objects into a desired set of goal states. Existing manipulation pipelines typically specify the desired configuration as a target 6-DOF pose and rely on explicitly estimating the pose of the manipulated objects. However, representing an object with a parameterized transformation defined on a fixed template cannot capture large intra-category shape variation, and specifying a target pose at a category level can be physically infeasible or fail to accomplish the task -- e.g. knowing the pose and size of a coffee mug relative to some canonical mug is not sufficient to successfully hang it on a rack by its handle. Hence we propose a novel formulation of category-level manipulation that uses semantic 3D keypoints as the object representation. This keypoint representation enables a simple and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques
