Multimodal Word Distributions
Ben Athiwaratkun, Andrew Gordon Wilson

TL;DR
This paper introduces multimodal word distributions using Gaussian mixtures to better capture multiple meanings and semantic nuances, outperforming previous embedding methods on standard benchmarks.
Contribution
It proposes a novel Gaussian mixture model for word embeddings, incorporating rich uncertainty and multiple meanings, learned via an energy-based max-margin objective.
Findings
Outperforms word2vec skip-grams on similarity tasks
Captures multiple word meanings effectively
Provides richer semantic representations
Abstract
Word embeddings provide point representations of words containing useful semantic information. We introduce multimodal word distributions formed from Gaussian mixtures, for multiple word meanings, entailment, and rich uncertainty information. To learn these distributions, we propose an energy-based max-margin objective. We show that the resulting approach captures uniquely expressive semantic information, and outperforms alternatives, such as word2vec skip-grams, and Gaussian embeddings, on benchmark datasets such as word similarity and entailment.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
