BiFold: Bimanual Cloth Folding with Language Guidance
Oriol Barbany, Adri\`a Colom\'e, Carme Torras

TL;DR
BiFold is a novel approach that uses a vision-language model to enable robots to fold clothes based on natural language commands, addressing the complexity of cloth manipulation and language understanding.
Contribution
It introduces a new dataset with automatically parsed actions and language instructions, and achieves state-of-the-art results in language-conditioned cloth folding.
Findings
State-of-the-art performance on folding benchmark
Strong generalization to new instructions and garments
Effective use of a pre-trained vision-language model for manipulation
Abstract
Cloth folding is a complex task due to the inevitable self-occlusions of clothes, their complicated dynamics, and the disparate materials, geometries, and textures that garments can have. In this work, we learn folding actions conditioned on text commands. Translating high-level, abstract instructions into precise robotic actions requires sophisticated language understanding and manipulation capabilities. To do that, we leverage a pre-trained vision-language model and repurpose it to predict manipulation actions. Our model, BiFold, can take context into account and achieves state-of-the-art performance on an existing language-conditioned folding benchmark. To address the lack of annotated bimanual folding data, we introduce a novel dataset with automatically parsed actions and language-aligned instructions, enabling better learning of text-conditioned manipulation. BiFold attains the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Materials and Mechanics · Modular Robots and Swarm Intelligence · Interactive and Immersive Displays
