Redundancy, Isotropy, and Intrinsic Dimensionality of Prompt-based Text Embeddings

Hayato Tsukagoshi; Ryohei Sasano

arXiv:2506.01435·cs.CL·June 3, 2025

Redundancy, Isotropy, and Intrinsic Dimensionality of Prompt-based Text Embeddings

Hayato Tsukagoshi, Ryohei Sasano

PDF

Open Access

TL;DR

This paper investigates the redundancy and intrinsic properties of prompt-based text embeddings, showing that significant dimensionality reduction minimally impacts performance, especially for classification and clustering tasks.

Contribution

It provides a comprehensive analysis of the redundancy, isotropy, and intrinsic dimensionality of prompt-based embeddings, highlighting their high redundancy and robustness to dimensionality reduction.

Findings

01

Naive dimensionality reduction causes minimal performance loss.

02

Embeddings for classification and clustering have lower intrinsic dimensionality.

03

High-dimensional embeddings exhibit high redundancy and less isotropy.

Abstract

Prompt-based text embedding models, which generate task-specific embeddings upon receiving tailored prompts, have recently demonstrated remarkable performance. However, their resulting embeddings often have thousands of dimensions, leading to high storage costs and increased computational costs of embedding-based operations. In this paper, we investigate how post-hoc dimensionality reduction applied to the embeddings affects the performance of various tasks that leverage these embeddings, specifically classification, clustering, retrieval, and semantic textual similarity (STS) tasks. Our experiments show that even a naive dimensionality reduction, which keeps only the first 25% of the dimensions of the embeddings, results in a very slight performance degradation, indicating that these embeddings are highly redundant. Notably, for classification and clustering, even when embeddings are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques · Natural Language Processing Techniques · Topic Modeling