TL;DR
The paper introduces comp-syn, a Python package that creates perceptually grounded word embeddings based on color distributions from image search results, enhancing semantic models by incorporating sensory information.
Contribution
It presents a novel method for generating grounded word embeddings using color data, improving predictions of word concreteness and maintaining performance on metaphorical vs. literal classification tasks.
Findings
comp-syn predicts human concreteness judgments more accurately than word2vec.
comp-syn performs comparably to word2vec on metaphorical vs. literal classification.
Provides open-source embeddings for over 40,000 English words.
Abstract
Popular approaches to natural language processing create word embeddings based on textual co-occurrence patterns, but often ignore embodied, sensory aspects of language. Here, we introduce the Python package comp-syn, which provides grounded word embeddings based on the perceptually uniform color distributions of Google Image search results. We demonstrate that comp-syn significantly enriches models of distributional semantics. In particular, we show that (1) comp-syn predicts human judgments of word concreteness with greater accuracy and in a more interpretable fashion than word2vec using low-dimensional word-color embeddings, and (2) comp-syn performs comparably to word2vec on a metaphorical vs. literal word-pair classification task. comp-syn is open-source on PyPi and is compatible with mainstream machine-learning Python packages. Our package release includes word-color embeddings…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
