FRAGE: Frequency-Agnostic Word Representation
Chengyue Gong, Di He, Xu Tan, Tao Qin, Liwei Wang, Tie-Yan Liu

TL;DR
This paper introduces FRAGE, a frequency-agnostic word embedding method using adversarial training, which improves the effectiveness of word representations, especially for rare words, across multiple NLP tasks.
Contribution
The paper proposes a novel adversarial training approach to learn frequency-agnostic word embeddings, addressing bias towards word frequency in existing embeddings.
Findings
FRAGE outperforms baselines in word similarity tasks
FRAGE improves performance in language modeling and machine translation
FRAGE enhances text classification results
Abstract
Continuous word representation (aka word embedding) is a basic building block in many neural network-based models used in natural language processing tasks. Although it is widely accepted that words with similar semantics should be close to each other in the embedding space, we find that word embeddings learned in several tasks are biased towards word frequency: the embeddings of high-frequency and low-frequency words lie in different subregions of the embedding space, and the embedding of a rare word and a popular word can be far from each other even if they are semantically similar. This makes learned word embeddings ineffective, especially for rare words, and consequently limits the performance of these neural network models. In this paper, we develop a neat, simple yet effective way to learn \emph{FRequency-AGnostic word Embedding} (FRAGE) using adversarial training. We conducted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
