Evaluation method of word embedding by roots and affixes

KeBin Peng

arXiv:1606.07601·cs.CL·June 27, 2016·1 cites

Evaluation method of word embedding by roots and affixes

KeBin Peng

PDF

Open Access

TL;DR

This paper introduces the Roots and Affixes Model (RAAM), a novel method for interpreting and evaluating word embeddings based on linguistic structures, demonstrating its effectiveness through experiments on English Wikipedia data.

Contribution

The paper proposes RAAM, a new intrinsic interpretability and evaluation method for word embeddings using roots and affixes, incorporating information entropy for better analysis.

Findings

01

Negative linear relation between two attributes in RAAM

02

High positive correlation with downstream semantic tasks

03

Effective interpretation of word vector dimensions

Abstract

Word embedding has been shown to be remarkably effective in a lot of Natural Language Processing tasks. However, existing models still have a couple of limitations in interpreting the dimensions of word vector. In this paper, we provide a new approach---roots and affixes model(RAAM)---to interpret it from the intrinsic structures of natural language. Also it can be used as an evaluation measure of the quality of word embedding. We introduce the information entropy into our model and divide the dimensions into two categories, just like roots and affixes in lexical semantics. Then considering each category as a whole rather than individually. We experimented with English Wikipedia corpus. Our result show that there is a negative linear relation between the two attributes and a high positive correlation between our model and downstream semantic evaluation tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques