SDT-6D: Fully Sparse Depth-Transformer for Staged End-to-End 6D Pose Estimation in Industrial Multi-View Bin Picking
Nico Leuze, Maximilian Hoh, Samed Do\u{g}an, Nicolas R.-Pe\~na, Alfred Schoettl

TL;DR
This paper presents SDT-6D, a fully sparse depth-transformer framework for precise 6D pose estimation in cluttered industrial environments, leveraging multi-view depth fusion, scene-adaptive attention, and a novel voting strategy.
Contribution
It introduces a staged heatmap mechanism and a density-aware sparse transformer for high-resolution, sparse 3D data processing in 6D pose estimation.
Findings
Achieves competitive accuracy on IPD and MV-YCB datasets.
Effectively handles occlusions and clutter in industrial bin picking.
Operates efficiently with high-resolution sparse volumetric data.
Abstract
Accurately recovering 6D poses in densely packed industrial bin-picking environments remain a serious challenge, owing to occlusions, reflections, and textureless parts. We introduce a holistic depth-only 6D pose estimation approach that fuses multi-view depth maps into either a fine-grained 3D point cloud in its vanilla version, or a sparse Truncated Signed Distance Field (TSDF). At the core of our framework lies a staged heatmap mechanism that yields scene-adaptive attention priors across different resolutions, steering computation toward foreground regions, thus keeping memory requirements at high resolutions feasible. Along, we propose a density-aware sparse transformer block that dynamically attends to (self-) occlusions and the non-uniform distribution of 3D data. While sparse 3D approaches has proven effective for long-range perception, its potential in close-range robotic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Robotics and Sensor-Based Localization · 3D Shape Modeling and Analysis
