Learning Sim-Grounded Policies for Bimanual Rope Manipulation from Human Teleoperation Data
Gina Wigginghaus, Tim Missal, Berk Guler, Simon Manschitz, Jan Peters

TL;DR
This paper compares visual and state-based policies for bimanual rope manipulation learned from human teleoperation, finding state-based approaches generalize better and reduce errors, highlighting the importance of observation space.
Contribution
It demonstrates that state-based policies outperform visual ones in rope manipulation tasks, emphasizing the significance of observation space choice for generalization from limited data.
Findings
State-based policy reduces L1 error by 30.8% over visual policy.
Visual policies struggle with generalization due to observability gaps.
State-based approach leverages physics-consistent state estimation.
Abstract
Deformable Linear Objects (DLOs) such as ropes and cables are widely encountered in both household and industrial applications, yet remain challenging to manipulate due to their infinite-dimensional configuration space and frequent self-occlusion. Imitation learning from teleoperation offers a practical path to bimanual DLO manipulation, but its scalability is limited by human effort, making the choice of observation space critical for generalization from small datasets. In this study, we investigate whether the lack of generalization in egocentric visual policies for the knot-untangling task stems from the observation space itself, rather than from the policy architecture or data scale. We compare two Action Chunking with Transformers policies trained on the same bimanual teleoperation data: a vision-based policy conditioned on two egocentric RGB streams from wrist-mounted cameras, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
