Evaluation method of word embedding by roots and affixes
KeBin Peng

TL;DR
This paper introduces the Roots and Affixes Model (RAAM), a novel method for interpreting and evaluating word embeddings based on linguistic structures, demonstrating its effectiveness through experiments on English Wikipedia data.
Contribution
The paper proposes RAAM, a new intrinsic interpretability and evaluation method for word embeddings using roots and affixes, incorporating information entropy for better analysis.
Findings
Negative linear relation between two attributes in RAAM
High positive correlation with downstream semantic tasks
Effective interpretation of word vector dimensions
Abstract
Word embedding has been shown to be remarkably effective in a lot of Natural Language Processing tasks. However, existing models still have a couple of limitations in interpreting the dimensions of word vector. In this paper, we provide a new approach---roots and affixes model(RAAM)---to interpret it from the intrinsic structures of natural language. Also it can be used as an evaluation measure of the quality of word embedding. We introduce the information entropy into our model and divide the dimensions into two categories, just like roots and affixes in lexical semantics. Then considering each category as a whole rather than individually. We experimented with English Wikipedia corpus. Our result show that there is a negative linear relation between the two attributes and a high positive correlation between our model and downstream semantic evaluation tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
