DOLG: Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features
Min Yang, Dongliang He, Miao Fan, Baorong Shi, Xuetong Xue, Fu Li,, Errui Ding, Jizhou Huang

TL;DR
This paper introduces DOLG, a novel single-stage image retrieval framework that effectively fuses local and global features into compact representations, achieving state-of-the-art results.
Contribution
The paper proposes a deep orthogonal fusion method that integrates local and global features in a single end-to-end trainable model for image retrieval.
Findings
Achieves state-of-the-art performance on Revisited Oxford and Paris datasets.
Effectively combines local and global features in a single, end-to-end framework.
Outperforms previous two-stage retrieval methods in accuracy.
Abstract
Image Retrieval is a fundamental task of obtaining images similar to the query one from a database. A common image retrieval practice is to firstly retrieve candidate images via similarity search using global image features and then re-rank the candidates by leveraging their local features. Previous learning-based studies mainly focus on either global or local image representation learning to tackle the retrieval task. In this paper, we abandon the two-stage paradigm and seek to design an effective single-stage solution by integrating local and global information inside images into compact image representations. Specifically, we propose a Deep Orthogonal Local and Global (DOLG) information fusion framework for end-to-end image retrieval. It attentively extracts representative local information with multi-atrous convolutions and self-attention at first. Components orthogonal to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Multimodal Machine Learning Applications
MethodsDeep Orthogonal Fusion of Local and Global Features
