Sentence Smith: Controllable Edits for Evaluating Text Embeddings
Hongji Li, Andrianos Michail, Reto Gubelmann, Simon Clematide, Juri Opitz

TL;DR
Sentence Smith offers a framework for controllable, transparent text editing via semantic parsing and manipulation, enabling detailed evaluation of text embeddings with high-quality, resource-efficient generation.
Contribution
It introduces a novel semantic graph-based framework for controllable text editing that improves evaluation transparency and granularity in text embedding models.
Findings
Effective semantic manipulation rules are developed.
Framework produces high-quality, resource-efficient text generations.
Enables fine-grained evaluation of semantic shifts in embeddings.
Abstract
Controllable and transparent text generation has been a long-standing goal in NLP. Almost as long-standing is a general idea for addressing this challenge: Parsing text to a symbolic representation, and generating from it. However, earlier approaches were hindered by parsing and generation insufficiencies. Using modern parsers and a safety supervision mechanism, we show how close current methods come to this goal. Concretely, we propose the Sentence Smith framework for English, which has three steps: 1. Parsing a sentence into a semantic graph. 2. Applying human-designed semantic manipulation rules. 3. Generating text from the manipulated graph. A final entailment check (4.) verifies the validity of the applied transformation. To demonstrate our framework's utility, we use it to induce hard negative text pairs that challenge text embedding models. Since the controllable generation makes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques
