KVN: Keypoints Voting Network with Differentiable RANSAC for Stereo Pose Estimation
Ivano Donadi, Alberto Pretto

TL;DR
This paper introduces a differentiable RANSAC layer and a multi-view PnP solver to improve stereo object pose estimation, achieving state-of-the-art results on challenging datasets.
Contribution
It presents a novel end-to-end trainable stereo pose estimation framework integrating differentiable RANSAC and multi-view PnP, advancing prior non-differentiable RANSAC-based methods.
Findings
Differentiable RANSAC significantly improves pose accuracy.
The multi-view PnP enhances robustness across views.
State-of-the-art results on public and custom datasets.
Abstract
Object pose estimation is a fundamental computer vision task exploited in several robotics and augmented reality applications. Many established approaches rely on predicting 2D-3D keypoint correspondences using RANSAC (Random sample consensus) and estimating the object pose using the PnP (Perspective-n-Point) algorithm. Being RANSAC non-differentiable, correspondences cannot be directly learned in an end-to-end fashion. In this paper, we address the stereo image-based object pose estimation problem by i) introducing a differentiable RANSAC layer into a well-known monocular pose estimation network; ii) exploiting an uncertainty-driven multi-view PnP solver which can fuse information from multiple views. We evaluate our approach on a challenging public stereo object pose estimation dataset and a custom-built dataset we call Transparent Tableware Dataset (TTD), yielding state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Advanced Vision and Imaging
MethodsPnP
