DenserNet: Weakly Supervised Visual Localization Using Multi-scale Feature Aggregation
Dongfang Liu, Yiming Cui, Liqi Yan, Christos Mousas, Baijian Yang,, Yingjie Chen

TL;DR
DenserNet is a weakly supervised CNN architecture that aggregates multi-scale features for improved visual localization and image retrieval, achieving state-of-the-art results efficiently.
Contribution
It introduces a multi-scale feature aggregation CNN trained with weak supervision for enhanced localization without pixel-level annotations.
Findings
Sets new state-of-the-art on four localization benchmarks.
Improves image retrieval accuracy with denser feature maps.
Efficient architecture with shared features and parameters.
Abstract
In this work, we introduce a Denser Feature Network (DenserNet) for visual localization. Our work provides three principal contributions. First, we develop a convolutional neural network (CNN) architecture which aggregates feature maps at different semantic levels for image representations. Using denser feature maps, our method can produce more keypoint features and increase image retrieval accuracy. Second, our model is trained end-to-end without pixel-level annotation other than positive and negative GPS-tagged image pairs. We use a weakly supervised triplet ranking loss to learn discriminative features and encourage keypoint feature repeatability for image representation. Finally, our method is computationally efficient as our architecture has shared features and parameters during computation. Our method can perform accurate large-scale localization under challenging conditions while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Multimodal Machine Learning Applications
