Provable Accuracy Collapse in Embedding-Based Representations under Dimensionality Mismatch

Dionysis Arvanitakis; Vaggos Chatziafratis; Yiyuan Luo

arXiv:2605.03346·cs.DS·May 6, 2026

Provable Accuracy Collapse in Embedding-Based Representations under Dimensionality Mismatch

Dionysis Arvanitakis, Vaggos Chatziafratis, Yiyuan Luo

PDF

TL;DR

This paper proves that embedding-based representations in machine learning face a sharp accuracy decline if the embedding dimension is significantly lower than the true data dimension, even under standard contrastive learning scenarios.

Contribution

It establishes fundamental information-theoretic limits and computational hardness results for low-dimensional embeddings in contrastive learning.

Findings

01

Accuracy collapses when embedding dimension is below a constant fraction of the ground-truth dimension.

02

Every low-dimensional embedding violates half of the triplet constraints, leading to trivial solutions.

03

Under the Unique Games Conjecture, no polynomial-time algorithm can surpass 50% accuracy regardless of embedding dimension.

Abstract

Embedding-based representations in Euclidean space $R^{d}$ are a cornerstone of modern machine learning, where a major goal is to use the \emph{smallest dimension} that faithfully captures data relations. In this work, we prove sharp dimension--accuracy tradeoffs and identify a fundamental information-theoretic limitation: unless the embedding dimension $d$ is chosen close to the ground-truth dimension $D$ , accuracy undergoes a sudden collapse. Our main result shows that this phenomenon arises even in standard contrastive learning settings, where supervision is limited to a set of $m$ anchor--positive--negative triplets $(i, j, k)$ encoding distance comparisons $dist (i, j) < dist (i, k)$ . Specifically, given triplets realizable by an unknown ground-truth embedding in $D$ dimensions, we prove that there exists constant $c < 1$ , such that \emph{every embedding of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.