End2End Multi-View Feature Matching with Differentiable Pose Optimization
Barbara Roessle, Matthias Nie{\ss}ner

TL;DR
This paper introduces an end-to-end differentiable approach for multi-view feature matching and pose estimation, significantly improving accuracy and efficiency by jointly optimizing matches and camera poses without RANSAC.
Contribution
It presents a novel graph attention network that predicts correspondences and confidence weights, enabling joint training with pose optimization for improved accuracy and speed.
Findings
Achieves 6.7% better pose accuracy than SuperGlue on ScanNet.
Reduces pose estimation time by over 50%.
Improves multi-view pose metrics on Matterport3D by 18.5%.
Abstract
Erroneous feature matches have severe impact on subsequent camera pose estimation and often require additional, time-costly measures, like RANSAC, for outlier rejection. Our method tackles this challenge by addressing feature matching and pose optimization jointly. To this end, we propose a graph attention network to predict image correspondences along with confidence weights. The resulting matches serve as weighted constraints in a differentiable pose estimation. Training feature matching with gradients from pose optimization naturally learns to down-weight outliers and boosts pose estimation on image pairs compared to SuperGlue by 6.7% on ScanNet. At the same time, it reduces the pose estimation time by over 50% and renders RANSAC iterations unnecessary. Moreover, we integrate information from multiple views by spanning the graph across multiple frames to predict the matches all at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Robotics and Sensor-Based Localization · Advanced Neural Network Applications
