Estimation of embedding vectors in high dimensions
Golara Ahmadi Azar, Melika Emami, Alyson Fletcher, Sundeep Rangan

TL;DR
This paper investigates the theoretical limits of learning embeddings in high-dimensional settings using a probabilistic model and AMP algorithms, providing precise accuracy predictions and insights into key parameters affecting embedding quality.
Contribution
It introduces a probabilistic model for embedding estimation and applies AMP methods to predict estimation accuracy in high dimensions, validated by simulations.
Findings
AMP can accurately predict embedding estimation accuracy
Key parameters like sample size and correlation strength influence learning success
Theoretical results align with experiments on synthetic and real data
Abstract
Embeddings are a basic initial feature extraction step in many machine learning models, particularly in natural language processing. An embedding attempts to map data tokens to a low-dimensional space where similar tokens are mapped to vectors that are close to one another by some metric in the embedding space. A basic question is how well can such embedding be learned? To study this problem, we consider a simple probability model for discrete data where there is some "true" but unknown embedding where the correlation of random variables is related to the similarity of the embeddings. Under this model, it is shown that the embeddings can be learned by a variant of low-rank approximate message passing (AMP) method. The AMP approach enables precise predictions of the accuracy of the estimation in certain high-dimensional limits. In particular, the methodology provides insight on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Opinion Dynamics and Social Influence · Gene expression and cancer classification
MethodsAdversarial Model Perturbation
