RFM-Pose:Reinforcement-Guided Flow Matching for Fast Category-Level 6D Pose Estimation
Diya He, Qingchen Liu, Cong Zhang, Jiahu Qin

TL;DR
RFM-Pose introduces a reinforcement learning-based flow matching framework that accelerates category-level 6D pose estimation by efficiently generating and refining pose hypotheses, reducing computational costs while maintaining high accuracy.
Contribution
The paper presents a novel reinforcement learning approach using flow-matching generative models for faster and more efficient category-level 6D pose estimation.
Findings
Achieves competitive accuracy on the REAL275 benchmark.
Significantly reduces computational cost compared to existing methods.
Adapts effectively to object pose tracking scenarios.
Abstract
Object pose estimation is a fundamental problem in computer vision and plays a critical role in virtual reality and embodied intelligence, where agents must understand and interact with objects in 3D space. Recently, score based generative models have to some extent solved the rotational symmetry ambiguity problem in category level pose estimation, but their efficiency remains limited by the high sampling cost of score-based diffusion. In this work, we propose a new framework, RFM-Pose, that accelerates category-level 6D object pose generation while actively evaluating sampled hypotheses. To improve sampling efficiency, we adopt a flow-matching generative model and generate pose candidates along an optimal transport path from a simple prior to the pose distribution. To further refine these candidates, we cast the flow-matching sampling process as a Markov decision process and apply…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis
