SimpleMatch: A Simple and Strong Baseline for Semantic Correspondence
Hailing Jin, Huiying Li

TL;DR
SimpleMatch is a lightweight, efficient framework for semantic correspondence that maintains high accuracy at low resolutions by addressing feature fusion issues and employing multi-scale supervision.
Contribution
It introduces a simple, effective method with a lightweight decoder and multi-scale loss to improve semantic correspondence at low resolutions, reducing memory usage significantly.
Findings
Achieves 84.1% [email protected] on SPair-71k at 252x252 resolution
Reduces training memory by 51%
Performs better than state-of-the-art methods at lower resolutions
Abstract
Recent advances in semantic correspondence have been largely driven by the use of pre-trained large-scale models. However, a limitation of these approaches is their dependence on high-resolution input images to achieve optimal performance, which results in considerable computational overhead. In this work, we address a fundamental limitation in current methods: the irreversible fusion of adjacent keypoint features caused by deep downsampling operations. This issue is triggered when semantically distinct keypoints fall within the same downsampled receptive field (e.g., 16x16 patches). To address this issue, we present SimpleMatch, a simple yet effective framework for semantic correspondence that delivers strong performance even at low resolutions. We propose a lightweight upsample decoder that progressively recovers spatial detail by upsampling deep features to 1/4 resolution, and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Face recognition and analysis
