Real-Time Loop Closure Detection in Visual SLAM via NetVLAD and Faiss
Enguang Fan

TL;DR
This paper demonstrates that NetVLAD combined with Faiss enables real-time, accurate loop closure detection in visual SLAM, outperforming traditional bag-of-words methods like DBoW in robustness and speed.
Contribution
It empirically evaluates NetVLAD for LCD, introduces a new precision-recall metric, and shows real-time performance with Faiss acceleration as a practical alternative.
Findings
NetVLAD outperforms DBoW in accuracy and robustness.
Faiss enables real-time query speed for NetVLAD.
Proposed Fine-Grained Top-K metric better reflects LCD performance.
Abstract
Loop closure detection (LCD) is a core component of simultaneous localization and mapping (SLAM): it identifies revisited places and enables pose-graph constraints that correct accumulated drift. Classic bag-of-words approaches such as DBoW are efficient but often degrade under appearance change and perceptual aliasing. In parallel, deep learning-based visual place recognition (VPR) descriptors (e.g., NetVLAD and Transformer-based models) offer stronger robustness, but their computational cost is often viewed as a barrier to real-time SLAM. In this paper, we empirically evaluate NetVLAD as an LCD module and compare it against DBoW on the KITTI dataset. We introduce a Fine-Grained Top-K precision-recall curve that better reflects LCD settings where a query may have zero or multiple valid matches. With Faiss-accelerated nearestneighbor search, NetVLAD achieves real-time query speed while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
