Enhancing Landmark Detection in Cluttered Real-World Scenarios with Vision Transformers
Mohammad Javad Rajabi, Morteza Mirzai, Ahmad Nickabadi

TL;DR
This paper introduces a novel vision transformer-based method that improves landmark detection in cluttered real-world scenes by isolating relevant patches and filtering out occluding objects, leading to higher accuracy.
Contribution
It presents a new approach that uses a selection process within vision transformers to enhance landmark detection amidst cluttered environments.
Findings
Achieved superior accuracy on augmented datasets
Effectively isolates relevant image patches
Demonstrates potential of transformers in cluttered scenarios
Abstract
Visual place recognition tasks often encounter significant challenges in landmark detection due to the presence of irrelevant objects such as humans, cars, and trees, despite the remarkable progress achieved by previous models, especially in the context of transformers. To address this issue, we propose a novel method that effectively leverages the strengths of vision transformers. By employing a meticulous selection process, our approach identifies and isolates specific patches within the image that correspond to occluding objects. To evaluate the efficacy of our method, we created augmented datasets and conducted comprehensive testing. The results demonstrate the superior accuracy achieved by our proposed approach. This research contributes to the advancement of landmark detection in visual place recognition and shows the potential of leveraging vision transformers to overcome…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Advanced Neural Network Applications
