$\mathbb{R}^{2k}$ is Theoretically Large Enough for Embedding-based Top-$k$ Retrieval
Zihao Wang, Hang Yin, Lihui Liu, Hanghang Tong, Yangqiu Song, Ginny Wong, Simon See

TL;DR
This paper determines the minimal embedding dimension needed for subset membership retrieval, showing that a space of dimension 2k is sufficient, which clarifies the limitations and potential of embedding-based top-k retrieval systems.
Contribution
The paper provides tight theoretical bounds on the minimal embedding dimension for subset membership, supported by empirical results across various similarity measures.
Findings
MED is tightly bounded by 2k for subset embeddings.
Numerical simulations show a logarithmic relation between MED and number of elements.
Embedding limitations are mainly due to learnability, not geometric constraints.
Abstract
This paper studies the minimal dimension required to embed subset memberships ( elements and subsets of at most elements) into vector spaces, denoted as Minimal Embeddable Dimension (MED). The tight bounds of MED are derived theoretically and supported empirically for various notions of "distances" or "similarities," including the metric, inner product, and cosine similarity. In addition, we conduct numerical simulation in a more achievable setting, where the subset embeddings are chosen as the centroid of the embeddings of the contained elements. Our simulation easily realizes a logarithmic dependency between the MED and the number of elements to embed. These findings imply that embedding-based retrieval limitations stem primarily from learnability challenges, not geometric constraints, guiding future algorithm design.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Advanced Image and Video Retrieval Techniques · Information Retrieval and Search Behavior
