Bimanual Robot Manipulation via Multi-Agent In-Context Learning
Alessio Palma, Indro Spinelli, Vignesh Prasad, Luca Scofano, Yufeng Jin, Georgia Chalvatzaki, Fabio Galasso

TL;DR
This paper introduces BiCICLe, a novel framework enabling large language models to perform few-shot bimanual robot manipulation without fine-tuning, by framing the task as a multi-agent problem with iterative refinement.
Contribution
BiCICLe is the first approach to enable standard LLMs to handle bimanual manipulation tasks through multi-agent in-context learning and iterative trajectory refinement.
Findings
Achieves up to 71.1% success rate on TWIN benchmark tasks.
Outperforms the best training-free baseline by 6.7 percentage points.
Demonstrates strong few-shot generalization to new tasks.
Abstract
Language Models (LLMs) have emerged as powerful reasoning engines for embodied control. In particular, In-Context Learning (ICL) enables off-the-shelf, text-only LLMs to predict robot actions without any task-specific training while preserving their generalization capabilities. Applying ICL to bimanual manipulation remains challenging, as the high-dimensional joint action space and tight inter-arm coordination constraints rapidly overwhelm standard context windows. To address this, we introduce BiCICLe (Bimanual Coordinated In-Context Learning), the first framework that enables standard LLMs to perform few-shot bimanual manipulation without fine-tuning. BiCICLe frames bimanual control as a multi-agent leader-follower problem, decoupling the action space into sequential, conditioned single-arm predictions. This naturally extends to Arms' Debate, an iterative refinement process, and to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
