MESA: Effective Matching Redundancy Reduction by Semantic Area Segmentation
Yesheng Zhang, Shuhan Shen, Xu Zhao

TL;DR
MESA and DMESA are novel feature matching methods that leverage the Segment Anything Model to reduce redundancy by semantic area segmentation, improving accuracy and efficiency in point matching across diverse datasets.
Contribution
The paper introduces MESA and DMESA, novel semantic area-based feature matching methods that enhance matching accuracy and efficiency by integrating SAM and graph-based optimization.
Findings
DMESA achieves nearly five times faster matching than MESA.
Both methods improve accuracy across five datasets.
Methods show robustness to image resolution changes.
Abstract
We propose MESA and DMESA as novel feature matching methods, which utilize Segment Anything Model (SAM) to effectively mitigate matching redundancy. The key insight of our methods is to establish implicit-semantic area matching prior to point matching, based on advanced image understanding of SAM. Then, informative area matches with consistent internal semantic are able to undergo dense feature comparison, facilitating precise inside-area point matching. Specifically, MESA adopts a sparse matching framework and first obtains candidate areas from SAM results through a novel Area Graph (AG). Then, area matching among the candidates is formulated as graph energy minimization and solved by graphical models derived from AG. To address the efficiency issue of MESA, we further propose DMESA as its dense counterpart, applying a dense matching framework. After candidate areas are identified by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Advanced Neural Network Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Segment Anything Model
