Demonstrating Multi-Suction Item Picking at Scale via Multi-Modal Learning of Pick Success
Che Wang, Jeroen van Baar, Chaitanya Mitash, Shuai Li, Dylan Randle, Weiyao Wang, Sumedh Sontakke, Kostas E. Bekris, Kapil Katyal

TL;DR
This paper presents a multi-modal learning approach for multi-suction robotic item picking, demonstrating improved success prediction in diverse, real-world cluttered environments through extensive experiments and ablations.
Contribution
It introduces a multimodal visual encoder trained on real-world data for predicting pick success, advancing robotic manipulation in unstructured settings.
Findings
Multimodal models outperform single-modality approaches.
Pretraining enhances model performance and modality robustness.
Ablation studies highlight the importance of multimodal training and finetuning.
Abstract
This work demonstrates how autonomously learning aspects of robotic operation from sparsely-labeled, real-world data of deployed, engineered solutions at industrial scale can provide with solutions that achieve improved performance. Specifically, it focuses on multi-suction robot picking and performs a comprehensive study on the application of multi-modal visual encoders for predicting the success of candidate robotic picks. Picking diverse items from unstructured piles is an important and challenging task for robot manipulation in real-world settings, such as warehouses. Methods for picking from clutter must work for an open set of items while simultaneously meeting latency constraints to achieve high throughput. The demonstrated approach utilizes multiple input modalities, such as RGB, depth and semantic segmentation, to estimate the quality of candidate multi-suction picks. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Soft Robotics and Applications · Robotics and Sensor-Based Localization
MethodsSparse Evolutionary Training
