Multimodal Word Distributions

Ben Athiwaratkun; Andrew Gordon Wilson

arXiv:1704.08424·stat.ML·September 10, 2019·19 cites

Multimodal Word Distributions

Ben Athiwaratkun, Andrew Gordon Wilson

PDF

Open Access 2 Repos

TL;DR

This paper introduces multimodal word distributions using Gaussian mixtures to better capture multiple meanings and semantic nuances, outperforming previous embedding methods on standard benchmarks.

Contribution

It proposes a novel Gaussian mixture model for word embeddings, incorporating rich uncertainty and multiple meanings, learned via an energy-based max-margin objective.

Findings

01

Outperforms word2vec skip-grams on similarity tasks

02

Captures multiple word meanings effectively

03

Provides richer semantic representations

Abstract

Word embeddings provide point representations of words containing useful semantic information. We introduce multimodal word distributions formed from Gaussian mixtures, for multiple word meanings, entailment, and rich uncertainty information. To learn these distributions, we propose an energy-based max-margin objective. We show that the resulting approach captures uniquely expressive semantic information, and outperforms alternatives, such as word2vec skip-grams, and Gaussian embeddings, on benchmark datasets such as word similarity and entailment.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications