A New Pair of GloVes
Riley Carlson, John Bauer, and Christopher D. Manning

TL;DR
This paper introduces updated 2024 GloVe word embedding models trained on recent data, providing better cultural relevance and improved performance on recent NER tasks, while maintaining comparable results on traditional benchmarks.
Contribution
The paper presents new GloVe models with detailed documentation, trained on updated data, and demonstrates their improved performance on recent NER datasets and cultural relevance.
Findings
Incorporate new culturally relevant words
Perform comparably on analogy and similarity tasks
Show improved results on recent NER datasets
Abstract
This report documents, describes, and evaluates new 2024 English GloVe (Global Vectors for Word Representation) models. While the original GloVe models built in 2014 have been widely used and found useful, languages and the world continue to evolve and we thought that current usage could benefit from updated models. Moreover, the 2014 models were not carefully documented as to the exact data versions and preprocessing that were used, and we rectify this by documenting these new models. We trained two sets of word embeddings using Wikipedia, Gigaword, and a subset of Dolma. Evaluation through vocabulary comparison, direct testing, and NER tasks shows that the 2024 vectors incorporate new culturally and linguistically relevant words, perform comparably on structural tasks like analogy and similarity, and demonstrate improved performance on recent, temporally dependent NER datasets such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModular Robots and Swarm Intelligence
