SemGloVe: Semantic Co-occurrences for GloVe from BERT
Leilei Gan, Zhiyang Teng, Yue Zhang, Linchao Zhu, Fei Wu, Yi Yang

TL;DR
SemGloVe introduces a novel method to incorporate semantic co-occurrence information from BERT into static GloVe embeddings, overcoming local window limitations and improving performance on various tasks.
Contribution
This paper presents two models that extract semantic co-occurrences from BERT to enhance GloVe embeddings, a novel approach compared to traditional co-occurrence methods.
Findings
SemGloVe outperforms GloVe on word similarity datasets.
Semantic co-occurrence extraction improves embedding quality.
Models effectively utilize BERT's masked language and attention weights.
Abstract
GloVe learns word embeddings by leveraging statistical information from word co-occurrence matrices. However, word pairs in the matrices are extracted from a predefined local context window, which might lead to limited word pairs and potentially semantic irrelevant word pairs. In this paper, we propose SemGloVe, which distills semantic co-occurrences from BERT into static GloVe word embeddings. Particularly, we propose two models to extract co-occurrence statistics based on either the masked language model or the multi-head attention weights of BERT. Our methods can extract word pairs without limiting by the local window assumption and can define the co-occurrence weights by directly considering the semantic distance between word pairs. Experiments on several word similarity datasets and four external tasks show that SemGloVe can outperform GloVe.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsLinear Layer · Dropout · Softmax · Linear Warmup With Linear Decay · Dense Connections · Attention Dropout · Attention Is All You Need · Layer Normalization · WordPiece · Refunds@Expedia|||How do I get a full refund from Expedia?
