Set-Theoretic Compositionality of Sentence Embeddings
Naman Bansal, Yash mahajan, Sanjeev Sinha, Santu Karmaker

TL;DR
This paper introduces a set-theoretic framework with six criteria to evaluate the compositional properties of sentence embeddings independently of specific tasks, revealing that SBERT exhibits strong set-like compositionality.
Contribution
It proposes a novel set-theoretic evaluation framework for sentence embeddings and provides a new large dataset for benchmarking their compositional properties.
Findings
SBERT consistently shows strong set-like compositionality.
Classical and LLM-based encoders vary in their alignment with the criteria.
A new dataset of 192K samples is introduced for future benchmarking.
Abstract
Sentence encoders play a pivotal role in various NLP tasks; hence, an accurate evaluation of their compositional properties is paramount. However, existing evaluation methods predominantly focus on goal task-specific performance. This leaves a significant gap in understanding how well sentence embeddings demonstrate fundamental compositional properties in a task-independent context. Leveraging classical set theory, we address this gap by proposing six criteria based on three core "set-like" compositions/operations: \textit{TextOverlap}, \textit{TextDifference}, and \textit{TextUnion}. We systematically evaluate classical and Large Language Model (LLM)-based sentence encoders to assess their alignment with these criteria. Our findings show that SBERT consistently demonstrates set-like compositional properties, surpassing even the latest LLMs. Additionally, we introduce a new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
