Visual Identification of Articulated Object Parts
Vicky Zeng, Tabitha Edith Lee, Jacky Liang, Oliver Kroemer

TL;DR
This paper introduces FormNet, a neural network that identifies articulation mechanisms between object parts from a single RGB-D image, enabling robots to recognize and manipulate articulated objects without prior manipulation or kinematic knowledge.
Contribution
FormNet is the first model to predict articulation types from a single RGB-D frame using synthetic training data with domain randomization, generalizing to real-world images and unseen categories.
Findings
Achieves 82.5% accuracy in classifying articulation types on novel objects
Generalizes well to unseen categories and real-world images without fine-tuning
Trained on 100k synthetic images with photorealistic rendering
Abstract
As autonomous robots interact and navigate around real-world environments such as homes, it is useful to reliably identify and manipulate articulated objects, such as doors and cabinets. Many prior works in object articulation identification require manipulation of the object, either by the robot or a human. While recent works have addressed predicting articulation types from visual observations alone, they often assume prior knowledge of category-level kinematic motion models or sequence of observations where the articulated parts are moving according to their kinematic constraints. In this work, we propose FormNet, a neural network that identifies the articulation mechanisms between pairs of object parts from a single frame of an RGB-D image and segmentation masks. The network is trained on 100k synthetic images of 149 articulated objects from 6 categories. Synthetic images are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
