Word2Box: Capturing Set-Theoretic Semantics of Words using Box Embeddings
Shib Sankar Dasgupta, Michael Boratko, Siddhartha Mishra, Shriya, Atmakuri, Dhruvesh Patel, Xiang Lorraine Li, Andrew McCallum

TL;DR
Word2Box introduces box embeddings for words, enabling set-theoretic semantic modeling that captures complex word relationships beyond traditional vector similarity, improving performance on word similarity tasks especially for less common words.
Contribution
The paper presents a novel fuzzy-set interpretation of box embeddings and a training method that enhances semantic modeling in NLP.
Findings
Improved word similarity performance, especially for less common words
Demonstrated set-theoretic capabilities of box embeddings
Provided qualitative analysis of box embedding expressivity
Abstract
Learning representations of words in a continuous space is perhaps the most fundamental task in NLP, however words interact in ways much richer than vector dot product similarity can provide. Many relationships between words can be expressed set-theoretically, for example, adjective-noun compounds (eg. "red cars""cars") and homographs (eg. "tongue""body" should be similar to "mouth", while "tongue""language" should be similar to "dialect") have natural set-theoretic interpretations. Box embeddings are a novel region-based representation which provide the capability to perform these set-theoretic operations. In this work, we provide a fuzzy-set interpretation of box embeddings, and learn box representations of words using a set-theoretic training objective. We demonstrate improved performance on various word similarity tasks, particularly on less common words, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies
