EigenNoise: A Contrastive Prior to Warm-Start Representations
Hunter Scott Heidenreich, Jake Ryland Williams

TL;DR
EigenNoise introduces a simple, data-free initialization method for word vectors that performs competitively with trained models, highlighting potential for further research into intelligent, theory-informed initialization schemes.
Contribution
The paper proposes EigenNoise, a novel pre-training-free initialization scheme for word vectors based on a co-occurrence model, supported by information-theoretic analysis.
Findings
EigenNoise approaches GloVe performance without pre-training data
Preliminary results suggest EigenNoise is competitive with trained embeddings
The method invites further exploration of theory-informed initialization strategies
Abstract
In this work, we present a naive initialization scheme for word vectors based on a dense, independent co-occurrence model and provide preliminary results that suggest it is competitive and warrants further investigation. Specifically, we demonstrate through information-theoretic minimum description length (MDL) probing that our model, EigenNoise, can approach the performance of empirically trained GloVe despite the lack of any pre-training data (in the case of EigenNoise). We present these preliminary results with interest to set the stage for further investigations into how this competitive initialization works without pre-training data, as well as to invite the exploration of more intelligent initialization schemes informed by the theory of harmonic linguistic structure. Our application of this theory likewise contributes a novel (and effective) interpretation of recent discoveries…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Language and cultural evolution
MethodsGloVe Embeddings
