GarmentPile++: Affordance-Driven Cluttered Garments Retrieval with Vision-Language Reasoning
Mingleyang Li, Yuran Wang, Yue Chen, Tianxing Chen, Jiaqi Liang, Zishun Shen, Haoran Lu, Ruihai Wu, Hao Dong

TL;DR
This paper introduces GarmentPile++, a novel vision-language reasoning system that effectively retrieves individual garments from cluttered piles for home-assistant robotics, integrating segmentation, affordance perception, and dual-arm cooperation.
Contribution
It presents a new retrieval pipeline combining high-level reasoning with visual affordance perception, enhanced by segmentation and a dual-arm framework for complex garment handling.
Findings
Effective garment retrieval in cluttered piles demonstrated in real-world and simulation environments.
Robust segmentation and reasoning improve retrieval accuracy and safety.
Dual-arm cooperation handles large or sagging garments efficiently.
Abstract
Garment manipulation has attracted increasing attention due to its critical role in home-assistant robotics. However, the majority of existing garment manipulation works assume an initial state consisting of only one garment, while piled garments are far more common in real-world settings. To bridge this gap, we propose a novel garment retrieval pipeline that can not only follow language instruction to execute safe and clean retrieval but also guarantee exactly one garment is retrieved per attempt, establishing a robust foundation for the execution of downstream tasks (e.g., folding, hanging, wearing). Our pipeline seamlessly integrates vision-language reasoning with visual affordance perception, fully leveraging the high-level reasoning and planning capabilities of VLMs alongside the generalization power of visual affordance for low-level actions. To enhance the VLM's comprehensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis
