SI-Diff: A Framework for Learning Search and High-Precision Insertion with a Force-Domain Diffusion Policy

Yibo Liu; Stanko Oparnica; Simon Shewchun-Jakaitis; Guoyi Fu; Jie Wang; Jun Yang; Anand Jagannathan; Tony Hong-Yau Lo

arXiv:2605.12247·cs.RO·May 13, 2026

SI-Diff: A Framework for Learning Search and High-Precision Insertion with a Force-Domain Diffusion Policy

Yibo Liu, Stanko Oparnica, Simon Shewchun-Jakaitis, Guoyi Fu, Jie Wang, Jun Yang, Anand Jagannathan, Tony Hong-Yau Lo

PDF

TL;DR

SI-Diff is a unified framework that learns both search and high-precision insertion in contact-rich assembly tasks using a force-domain diffusion policy, improving tolerance to misalignments and enabling zero-shot transfer.

Contribution

It introduces a mode-conditioning mechanism and a search teacher policy to unify search and insertion tasks within a single model, enhancing robustness and transferability.

Findings

01

Extends x-y misalignment tolerance from 2mm to 5mm.

02

Demonstrates strong zero-shot transfer to unseen shapes.

03

Outperforms the state-of-the-art TacDiffusion baseline.

Abstract

Contact-rich assembly is fundamental in robotics but poses significant challenges due to uncertainties in relative poses, such as misalignments and small clearances in peg-in-hole tasks. Existing approaches typically address search and high-precision insertion separately, because these tasks involve distinct action patterns. However, supporting both tasks within a single model, without switching models or weights, is desirable for intelligent assembly systems. In this work, we propose SI-Diff, a framework that learns both search and high-precision insertion through a force-domain diffusion policy. To this end, we introduce a new mode-conditioning mechanism that enables the policy to capture distinct action behaviors under a single framework. Moreover, we develop a new search teacher policy that can generate diverse trajectories. By training on successful and efficient demonstrations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.