Generative 6D Pose Estimation via Conditional Flow Matching
Amir Hamza, Davide Boscaini, Weihang Li, Benjamin Busam, Fabio Poiesi

TL;DR
Flose introduces a novel generative approach for 6D pose estimation that leverages conditional flow matching with appearance-based features, outperforming prior methods on multiple datasets.
Contribution
The paper presents Flose, a new generative method using conditional flow matching with semantic features for robust 6D pose estimation, especially in symmetric objects.
Findings
Flose achieves +4.5 average recall over prior methods.
Incorporates appearance features to resolve symmetries.
Validated on five BOP benchmark datasets.
Abstract
Existing methods for instance-level 6D pose estimation typically rely on neural networks that either directly regress the pose in or estimate it indirectly via local feature matching. The former struggle with object symmetries, while the latter fail in the absence of distinctive local features. To overcome these limitations, we propose a novel formulation of 6D pose estimation as a conditional flow matching problem in . We introduce Flose, a generative method that infers object poses via a denoising process conditioned on local features. While prior approaches based on conditional flow matching perform denoising solely based on geometric guidance, Flose integrates appearance-based semantic features to mitigate ambiguities caused by object symmetries. We further incorporate RANSAC-based registration to handle outliers. We validate Flose on five datasets…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Robotics and Sensor-Based Localization · 3D Shape Modeling and Analysis
