BiFold: Bimanual Cloth Folding with Language Guidance

Oriol Barbany; Adri\`a Colom\'e; Carme Torras

arXiv:2501.16458·cs.RO·June 17, 2025

BiFold: Bimanual Cloth Folding with Language Guidance

Oriol Barbany, Adri\`a Colom\'e, Carme Torras

PDF

Open Access

TL;DR

BiFold is a novel approach that uses a vision-language model to enable robots to fold clothes based on natural language commands, addressing the complexity of cloth manipulation and language understanding.

Contribution

It introduces a new dataset with automatically parsed actions and language instructions, and achieves state-of-the-art results in language-conditioned cloth folding.

Findings

01

State-of-the-art performance on folding benchmark

02

Strong generalization to new instructions and garments

03

Effective use of a pre-trained vision-language model for manipulation

Abstract

Cloth folding is a complex task due to the inevitable self-occlusions of clothes, their complicated dynamics, and the disparate materials, geometries, and textures that garments can have. In this work, we learn folding actions conditioned on text commands. Translating high-level, abstract instructions into precise robotic actions requires sophisticated language understanding and manipulation capabilities. To do that, we leverage a pre-trained vision-language model and repurpose it to predict manipulation actions. Our model, BiFold, can take context into account and achieves state-of-the-art performance on an existing language-conditioned folding benchmark. To address the lack of annotated bimanual folding data, we introduce a novel dataset with automatically parsed actions and language-aligned instructions, enabling better learning of text-conditioned manipulation. BiFold attains the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Materials and Mechanics · Modular Robots and Swarm Intelligence · Interactive and Immersive Displays