TP3M: Transformer-based Pseudo 3D Image Matching with Reference Image
Liming Han, Zhaoxiang Liu, Shiguo Lian

TL;DR
This paper introduces a Transformer-based pseudo 3D image matching method that leverages a reference image to enhance feature descriptors, significantly improving matching accuracy in challenging scenes with large viewpoint or illumination changes.
Contribution
It proposes a novel pseudo 3D matching approach using a reference image to upgrade 2D features to 3D, enhancing matching performance in difficult scenarios.
Findings
Achieves state-of-the-art results in homography estimation
Improves pose estimation accuracy in challenging scenes
Enhances visual localization performance
Abstract
Image matching is still challenging in such scenes with large viewpoints or illumination changes or with low textures. In this paper, we propose a Transformer-based pseudo 3D image matching method. It upgrades the 2D features extracted from the source image to 3D features with the help of a reference image and matches to the 2D features extracted from the destination image by the coarse-to-fine 3D matching. Our key discovery is that by introducing the reference image, the source image's fine points are screened and furtherly their feature descriptors are enriched from 2D to 3D, which improves the match performance with the destination image. Experimental results on multiple datasets show that the proposed method achieves the state-of-the-art on the tasks of homography estimation, pose estimation and visual localization especially in challenging scenes.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
