On the Learnability of Concepts: With Applications to Comparing Word   Embedding Algorithms

Adam Sutton; Nello Cristianini

arXiv:2006.09896·cs.CL·June 18, 2020

On the Learnability of Concepts: With Applications to Comparing Word Embedding Algorithms

Adam Sutton, Nello Cristianini

PDF

TL;DR

This paper introduces a framework for analyzing the learnability of semantic concepts in word embeddings, compares different algorithms, and finds that fastText outperforms others in capturing semantic content.

Contribution

It proposes a novel concept-based analysis method for evaluating and comparing word embedding algorithms' ability to learn semantic categories.

Findings

01

All embedding methods capture semantic content of word lists.

02

fastText outperforms other embedding algorithms in learnability.

03

The method enables systematic comparison of embedding algorithms.

Abstract

Word Embeddings are used widely in multiple Natural Language Processing (NLP) applications. They are coordinates associated with each word in a dictionary, inferred from statistical properties of these words in a large corpus. In this paper we introduce the notion of "concept" as a list of words that have shared semantic content. We use this notion to analyse the learnability of certain concepts, defined as the capability of a classifier to recognise unseen members of a concept after training on a random subset of it. We first use this method to measure the learnability of concepts on pretrained word embeddings. We then develop a statistical analysis of concept learnability, based on hypothesis testing and ROC curves, in order to compare the relative merits of various embedding algorithms using a fixed corpora and hyper parameters. We find that all embedding methods capture the semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsfastText