Unlocking Zero-shot Potential of Semi-dense Image Matching via Gaussian Splatting
Juncheng Chen, Chao Xu, Yanjun Cao

TL;DR
This paper introduces MatchGS, a framework that refines 3D Gaussian Splatting to generate accurate correspondence labels, enabling zero-shot semi-dense image matching with significant performance improvements.
Contribution
The paper presents a novel pipeline for correcting 3DGS geometry and a 2D-3D alignment strategy, facilitating robust zero-shot image matching without prior training data.
Findings
Reduced epipolar error by up to 40 times
Achieved up to 17.7% performance improvement on benchmarks
Enabled supervision under extreme viewpoint changes
Abstract
Learning-based image matching critically depends on large-scale, diverse, and geometrically accurate training data. 3D Gaussian Splatting (3DGS) enables photorealistic novel-view synthesis and thus is attractive for data generation. However, its geometric inaccuracies and biased depth rendering currently prevent robust correspondence labeling. To address this, we introduce MatchGS, the first framework designed to systematically correct and leverage 3DGS for robust, zero-shot image matching. Our approach is twofold: (1) a geometrically-faithful data generation pipeline that refines 3DGS geometry to produce highly precise correspondence labels, enabling the synthesis of a vast and diverse range of viewpoints without compromising rendering fidelity; and (2) a 2D-3D representation alignment strategy that infuses 3DGS' explicit 3D knowledge into the 2D matcher, guiding 2D semi-dense matchers…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
1. The paper presents a self-supervised method, based on 3D gaussians, to generate data for training LoFTR-like models. The core motivation in this, being generating data in multiple views with gaussians for zero-shot performance, is very convincing. 2. The paper shows clear gains over LoFTR and ELoFTR despite being in a zero-shot setting, showing the effectiveness of proposed 3DGS-based learning of image matching.
1. The coarse and fine alignment strategy, despite being claimed complementary (L.265) does not show complementary results at all. In fact, the results for fine-level alignment drops in all cases and the method does not seem to incorporate it in its final design. If so, why is it introduced in the methodology section as a component? 2. Frankly, the paper is quite hard to follow. At first glance, it is unclear that how the GT correspondences are established for the generated data, where the rev
- Utilizing 3D Gaussian Splatting to overcome the limitation of restricted camera trajectories in existing datasets is an interesting direction. - To ensure geometric consistency in 3D Gaussian Splatting, introducing depth loss and planar Gaussians that enforce consistent geometry for matching is an intuitive and effective approach. - By leveraging 3D Gaussians, they achieve additional 2D-3D consistency matching and propose injecting 3D information into the model.
### Weaknesses: - **Unclear parameter ranges for camera perturbations:** What are the specific ranges of $\Delta R$ and $\Delta t$, and what scale is applied to the intrinsics? If these ranges are small, the claim that this dataset generation pipeline can produce extreme viewpoint changes may be overstated. - **Concerns about 3DGS rendering quality:** I am curious about the rendering quality of the 3D Gaussian Splatting. To my knowledge, depth loss is sufficient in few-shot settings but may be
This paper proposes an interesting and scalable way to obtain paired data for correspondence training. In particular, they propose to use gaussian splatting in order to create high quality synthetic pairs and want to demonstrate that this means they have unlimited data with which to train a downstream model. They compare a model trained with their approach to other methods for 2D correspondence, showing they get good results on 0-shot generalisation using MegaDepth and the ZEB dataset. In orde
1. This paper does not quite fulfill its aims. The main aim is to see how using synthetic data can lead to unlimited data that can be used to demonstrate the value of such data. However, they generally seem to have fixed the amount of data to 70/245 scenes in Table 4. instead, it would be better to show data scaling -- how much do things improve as more and more scenes are sampled as opposed to a relatively small amount of 245 and only looking at two points. We would want to see that things cont
The idea of using 3DGS to generate image matching pseudo-labels is interesting. The paper writing is clear.
1. The method largely follows the experiment setup of GIM, however, unlike GIM where the same method is applied to both sparse, semi-dense and dense matching methods, MatchGS only applies the proposed data to semi-dense methods. - This raises the problem of generalization, since theoretically I dont see why the generated pseudo-labels cannot be applied to other method types, e.g., GIM only has sparse/semi-dense labels, it still can be applied to improve dense methods. - This also raises the conc
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques · Advanced Vision and Imaging
