TL;DR
Pickalo is a low-cost, modular 6D pose estimation pipeline for industrial bin picking, achieving high success rates and throughput using active multi-view RGB-D sensing and synthetic data training.
Contribution
The paper introduces a novel low-cost bin-picking system combining active multi-view sensing, synthetic data-trained segmentation, and pose fusion for robust industrial manipulation.
Findings
Achieves up to 600 picks per hour with 96-99% grasp success.
Utilizes synthetic data for training segmentation, reducing real data requirements.
Demonstrates robustness over 30-minute dense bin picking runs.
Abstract
Bin picking in real industrial environments remains challenging due to severe clutter, occlusions, and the high cost of traditional 3D sensing setups. We present Pickalo, a modular 6D pose-based bin-picking pipeline built entirely on low-cost hardware. A wrist-mounted RGB-D camera actively explores the scene from multiple viewpoints, while raw stereo streams are processed with BridgeDepth to obtain refined depth maps suitable for accurate collision reasoning. Object instances are segmented with a Mask-RCNN model trained purely on photorealistic synthetic data and localized using the zero-shot SAM-6D pose estimator. A pose buffer module fuses multi-view observations over time, handling object symmetries and significantly reducing pose noise. Offline, we generate and curate large sets of antipodal grasp candidates per object; online, a utility-based ranking and fast collision checking are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
