Leveraging Cutting Edge Deep Learning Based Image Matching for Reconstructing a Large Scene from Sparse Images
Georg B\"okman, Johan Edstedt

TL;DR
This paper introduces a deep learning-based image matching approach combined with structure from motion to accurately reconstruct large scenes from sparse, sequential images, achieving top performance in a visual localization challenge.
Contribution
The paper presents a novel integration of deep learning image matchers, keypoint extraction, and image retrieval techniques to enhance scene reconstruction from sparse images.
Findings
Achieved third place in the AISG-SLA Visual Localisation Challenge.
Improved reconstruction accuracy by matching non-consecutive images using image retrieval.
Provided an upper bound estimate for retrieval-based matching accuracy.
Abstract
We present the top ranked solution for the AISG-SLA Visual Localisation Challenge benchmark (IJCAI 2023), where the task is to estimate relative motion between images taken in sequence by a camera mounted on a car driving through an urban scene. For matching images we use our recent deep learning based matcher RoMa. Matching image pairs sequentially and estimating relative motion from point correspondences sampled by RoMa already gives very competitive results -- third rank on the challenge benchmark. To improve the estimations we extract keypoints in the images, match them using RoMa, and perform structure from motion reconstruction using COLMAP. We choose our recent DeDoDe keypoints for their high repeatability. Further, we address time jumps in the image sequence by matching specific non-consecutive image pairs based on image retrieval with DINOv2. These improvements yield a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Vision and Imaging · Human Pose and Action Recognition
