RoCap: A Robotic Data Collection Pipeline for the Pose Estimation of Appearance-Changing Objects
Jiahao Nick Li, Toby Chong, Zhongyi Zhou, Hironori Yoshida, Koji, Yatani, Xiang 'Anthony' Chen, Takeo Igarashi

TL;DR
Rocap is a robotic data collection pipeline that captures images of objects in various poses to improve pose estimation for appearance-changing objects, outperforming traditional synthetic data methods.
Contribution
The paper introduces Rocap, a novel robotic pipeline for generating labeled data for pose estimation of appearance-changing objects, addressing limitations of static object methods.
Findings
Rocap effectively captures diverse object poses with ground truth labels.
Models trained on Rocap data outperform synthetic data-based models.
The approach improves pose estimation accuracy for deformable, transparent, reflective, and articulated objects.
Abstract
Object pose estimation plays a vital role in mixed-reality interactions when users manipulate tangible objects as controllers. Traditional vision-based object pose estimation methods leverage 3D reconstruction to synthesize training data. However, these methods are designed for static objects with diffuse colors and do not work well for objects that change their appearance during manipulation, such as deformable objects like plush toys, transparent objects like chemical flasks, reflective objects like metal pitchers, and articulated objects like scissors. To address this limitation, we propose Rocap, a robotic pipeline that emulates human manipulation of target objects while generating data labeled with ground truth pose information. The user first gives the target object to a robotic arm, and the system captures many pictures of the object in various 6D configurations. The system…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Robotics and Sensor-Based Localization
