GASE: Generatively Augmented Sentence Encoding
Manuel Frank, Haithem Afli

TL;DR
GASE introduces a training-free method that enhances sentence embeddings at inference time by using generative models for data augmentation, improving performance without model fine-tuning.
Contribution
It presents a novel inference-time augmentation technique using generative models to improve sentence embeddings without requiring parameter access or additional training.
Findings
Performance improves across various embedding models.
Greater gains observed for models with lower baseline performance.
Enhances robustness and semantic diversity of sentence embeddings.
Abstract
We propose a training-free approach to improve sentence embeddings leveraging test-time compute by applying generative text models for data augmentation at inference time. Unlike conventional data augmentation that utilises synthetic training data, our approach does not require access to model parameters or the computational resources typically required for fine-tuning state-of-the-art models. Generatively Augmented Sentence Encoding variates the input text by paraphrasing, summarising, or extracting keywords, followed by pooling the original and synthetic embeddings. Experimental results on the Massive Text Embedding Benchmark for Semantic Textual Similarity (STS) demonstrate performance improvements across a range of embedding models using different generative models for augmentation. We find that generative augmentation leads to larger performance improvements for embedding models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
