Learning Word Embedding with Better Distance Weighting and Window Size Scheduling
Chaohao Yang, Chris Ding

TL;DR
This paper introduces two novel methods, LFW and EDWS, to improve Word2Vec by incorporating distance information, leading to better word embeddings in NLP tasks.
Contribution
It proposes LFW and EDWS, new techniques for integrating distance weighting and dynamic window sizing into Word2Vec models, enhancing their performance.
Findings
LFW improves CBOW's influence modeling with learnable weights.
EDWS balances distance information in Skip-gram's window size.
Experiments show surpassing state-of-the-art results.
Abstract
Distributed word representation (a.k.a. word embedding) is a key focus in natural language processing (NLP). As a highly successful word embedding model, Word2Vec offers an efficient method for learning distributed word representations on large datasets. However, Word2Vec lacks consideration for distances between center and context words. We propose two novel methods, Learnable Formulated Weights (LFW) and Epoch-based Dynamic Window Size (EDWS), to incorporate distance information into two variants of Word2Vec, the Continuous Bag-of-Words (CBOW) model and the Continuous Skip-gram (Skip-gram) model. For CBOW, LFW uses a formula with learnable parameters that best reflects the relationship of influence and distance between words to calculate distance-related weights for average pooling, providing insights for future NLP text modeling research. For Skip-gram, we improve its dynamic window…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Intelligent Tutoring Systems and Adaptive Learning · Educational Technology and Assessment
MethodsFocus
