GenEOL: Harnessing the Generative Power of LLMs for Training-Free   Sentence Embeddings

Raghuveer Thirukovalluru; Bhuwan Dhingra

arXiv:2410.14635·cs.CL·February 11, 2025

GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings

Raghuveer Thirukovalluru, Bhuwan Dhingra

PDF

Open Access 1 Repo 1 Video

TL;DR

GenEOL leverages the generative capabilities of pretrained LLMs to produce diverse sentence transformations, which are aggregated to create superior training-free sentence embeddings that outperform existing methods across multiple benchmarks.

Contribution

The paper introduces GenEOL, a novel training-free embedding method that utilizes LLMs' generative abilities to improve sentence embeddings without additional training.

Findings

01

Outperforms existing training-free methods by 2.85 points on STS benchmark

02

Enhances clustering, reranking, and classification tasks in MTEB benchmark

03

Provides stable and robust sentence embeddings across LLM layers

Abstract

Training-free embedding methods directly leverage pretrained large language models (LLMs) to embed text, bypassing the costly and complex procedure of contrastive learning. Previous training-free embedding methods have mainly focused on optimizing embedding prompts and have overlooked the benefits of utilizing the generative abilities of LLMs. We propose a novel method, GenEOL, which uses LLMs to generate diverse transformations of a sentence that preserve its meaning, and aggregates the resulting embeddings of these transformations to enhance the overall sentence embedding. GenEOL significantly outperforms the existing training-free embedding methods by an average of 2.85 points across several LLMs on the sentence semantic text similarity (STS) benchmark. GenEOL also achieves notable gains in clustering, reranking, and pair-classification tasks from the MTEB benchmark. Additionally,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

raghavlite/GenEOL
pytorchOfficial

Videos

GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification