Learning to Fill the Seam by Vision: Sub-millimeter Peg-in-hole on Unseen Shapes in Real World
Liang Xie, Hongxiang Yu, Yinghao Zhao, Haodong Zhang, Zhongxiang Zhou,, Minhang Wang, Yue Wang, Rong Xiong

TL;DR
This paper presents a vision-based method for peg-in-hole tasks that imitates human seam-filling behavior, using learned estimators and reinforcement learning to achieve robust, efficient, and adaptable insertion on unseen shapes in real-world scenarios.
Contribution
It introduces a seam-based visual perception architecture combined with reinforcement learning for robust peg insertion on unseen geometries, trained entirely in simulation with easy sim-to-real transfer.
Findings
Achieves high success rate in simulation and real-world tests.
Demonstrates effective sim-to-real transfer with minimal effort.
Outperforms baseline methods in efficiency and robustness.
Abstract
In the peg insertion task, human pays attention to the seam between the peg and the hole and tries to fill it continuously with visual feedback. By imitating the human behavior, we design architectures with position and orientation estimators based on the seam representation for pose alignment, which proves to be general to the unseen peg geometries. By putting the estimators into the closed-loop control with reinforcement learning, we further achieve a higher or comparable success rate, efficiency, and robustness compared with the baseline methods. The policy is trained totally in simulation without any manual intervention. To achieve sim-to-real, a learnable segmentation module with automatic data collecting and labeling can be easily trained to decouple the perception and the policy, which helps the model trained in simulation quickly adapt to the real world with negligible effort.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robot Manipulation and Learning · Human Motion and Animation
