Geometric Analysis of Self-Supervised Vision Representations for Semantic Image Retrieval
Esteban Rodr\'iguez-Betancourt, Edgar Casasola-Murillo

TL;DR
This paper evaluates how modern self-supervised vision representations affect semantic image retrieval, highlighting the importance of latent space geometry for effective vector search.
Contribution
It provides a comprehensive analysis of the impact of self-supervised learning representations' geometry on retrieval performance in CBIR systems.
Findings
Anisotropic representations degrade ANN indexing performance.
Isotropic and locally pure representations improve retrieval accuracy.
Latent space geometry significantly influences vector database effectiveness.
Abstract
Content-based image retrieval (CBIR) systems enable users to search images based on visual content instead of relying on metadata. The text domain has benefited from vector search of representations created with unsupervised methods such as BERT. However, modern self-supervised learning methods for vision are mostly not reported in CBIR-related literature, instead relying on supervised models or multi-modal methods that align text and vision. We evaluate how the representations learned by modern self-supervised learning methods for vision perform under typical retrieval stacks that leverage vector databases and nearest neighbor search. Our evaluation reveals that the latent space geometry impacts approximate nearest neighbor (ANN) indexing. Specifically, highly anisotropic representations with high skewness produced by several modern SSL methods degrade the performance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
