A Theory of Local Matching: SIFT and Beyond
Hossein Mobahi, Stefano Soatto

TL;DR
This paper develops a comprehensive theory of local image descriptors based on energy minimization and heat diffusion, explaining the success of SIFT and DSP-SIFT, and guiding the creation of improved descriptors with fewer parameters.
Contribution
It introduces a unifying theoretical framework for local descriptors, explaining existing methods and enabling the design of new, more efficient descriptors with enhanced robustness.
Findings
DSP-SIFT better approximates the theoretical solution than SIFT
The theory explains why DSP-SIFT outperforms SIFT
New descriptors with fewer parameters are derived from the theory
Abstract
Why has SIFT been so successful? Why its extension, DSP-SIFT, can further improve SIFT? Is there a theory that can explain both? How can such theory benefit real applications? Can it suggest new algorithms with reduced computational complexity or new descriptors with better accuracy for matching? We construct a general theory of local descriptors for visual matching. Our theory relies on concepts in energy minimization and heat diffusion. We show that SIFT and DSP-SIFT approximate the solution the theory suggests. In particular, DSP-SIFT gives a better approximation to the theoretical solution; justifying why DSP-SIFT outperforms SIFT. Using the developed theory, we derive new descriptors that have fewer parameters and are potentially better in handling affine deformations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Advanced Vision and Imaging
