GASE: Generatively Augmented Sentence Encoding

Manuel Frank; Haithem Afli

arXiv:2411.04914·cs.CL·September 9, 2025

GASE: Generatively Augmented Sentence Encoding

Manuel Frank, Haithem Afli

PDF

Open Access

TL;DR

GASE introduces a training-free method that enhances sentence embeddings at inference time by using generative models for data augmentation, improving performance without model fine-tuning.

Contribution

It presents a novel inference-time augmentation technique using generative models to improve sentence embeddings without requiring parameter access or additional training.

Findings

01

Performance improves across various embedding models.

02

Greater gains observed for models with lower baseline performance.

03

Enhances robustness and semantic diversity of sentence embeddings.

Abstract

We propose a training-free approach to improve sentence embeddings leveraging test-time compute by applying generative text models for data augmentation at inference time. Unlike conventional data augmentation that utilises synthetic training data, our approach does not require access to model parameters or the computational resources typically required for fine-tuning state-of-the-art models. Generatively Augmented Sentence Encoding variates the input text by paraphrasing, summarising, or extracting keywords, followed by pooling the original and synthetic embeddings. Experimental results on the Massive Text Embedding Benchmark for Semantic Textual Similarity (STS) demonstrate performance improvements across a range of embedding models using different generative models for augmentation. We find that generative augmentation leads to larger performance improvements for embedding models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems