TL;DR
This paper presents a real-time 6-DoF object pose estimation system using pose interpreter networks trained solely on synthetic data, leveraging object masks to bridge the gap between synthetic and real images.
Contribution
The introduction of pose interpreter networks trained on synthetic data with object masks enables real-time pose estimation without real pose annotations.
Findings
Achieves 20 Hz real-time performance on live RGB data
Successfully generalizes from synthetic to real data using object masks
Does not require depth information or ICP refinement
Abstract
In this work, we introduce pose interpreter networks for 6-DoF object pose estimation. In contrast to other CNN-based approaches to pose estimation that require expensively annotated object pose data, our pose interpreter network is trained entirely on synthetic pose data. We use object masks as an intermediate representation to bridge real and synthetic. We show that when combined with a segmentation model trained on RGB images, our synthetically trained pose interpreter network is able to generalize to real data. Our end-to-end system for object pose estimation runs in real-time (20 Hz) on live RGB data, without using depth information or ICP refinement.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
