On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms
Adam Sutton, Nello Cristianini

TL;DR
This paper introduces a framework for analyzing the learnability of semantic concepts in word embeddings, compares different algorithms, and finds that fastText outperforms others in capturing semantic content.
Contribution
It proposes a novel concept-based analysis method for evaluating and comparing word embedding algorithms' ability to learn semantic categories.
Findings
All embedding methods capture semantic content of word lists.
fastText outperforms other embedding algorithms in learnability.
The method enables systematic comparison of embedding algorithms.
Abstract
Word Embeddings are used widely in multiple Natural Language Processing (NLP) applications. They are coordinates associated with each word in a dictionary, inferred from statistical properties of these words in a large corpus. In this paper we introduce the notion of "concept" as a list of words that have shared semantic content. We use this notion to analyse the learnability of certain concepts, defined as the capability of a classifier to recognise unseen members of a concept after training on a random subset of it. We first use this method to measure the learnability of concepts on pretrained word embeddings. We then develop a statistical analysis of concept learnability, based on hypothesis testing and ROC curves, in order to compare the relative merits of various embedding algorithms using a fixed corpora and hyper parameters. We find that all embedding methods capture the semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsfastText
