On the rankability of visual embeddings
Ankit Sonthalia, Arnas Uselis, Seong Joon Oh

TL;DR
This paper investigates whether visual embeddings encode ordinal attributes along linear directions, finding many embeddings are inherently rankable and can be used for image ranking with minimal supervision.
Contribution
It introduces the concept of rankability in visual embeddings and demonstrates that simple methods can recover meaningful rank axes across various models and datasets.
Findings
Many embeddings are inherently rankable.
Few samples or two extremes suffice to find rank axes.
Rankable embeddings enable new image ranking applications.
Abstract
We study whether visual embedding models capture continuous, ordinal attributes along linear directions, which we term _rank axes_. We define a model as _rankable_ for an attribute if projecting embeddings onto such an axis preserves the attribute's order. Across 7 popular encoders and 9 datasets with attributes like age, crowd count, head pose, aesthetics, and recency, we find that many embeddings are inherently rankable. Surprisingly, a small number of samples, or even just two extreme examples, often suffice to recover meaningful rank axes, without full-scale supervision. These findings open up new use cases for image ranking in vector databases and motivate further study into the structure and learning of rankable embeddings. Our code is available at https://github.com/aktsonthalia/rankable-vision-embeddings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
