Extracting Sentence Embeddings from Pretrained Transformer Models

Lukas Stankevi\v{c}ius; Mantas Luko\v{s}evi\v{c}ius

arXiv:2408.08073·cs.CL·February 21, 2025

Extracting Sentence Embeddings from Pretrained Transformer Models

Lukas Stankevi\v{c}ius, Mantas Luko\v{s}evi\v{c}ius

PDF

Open Access

TL;DR

This paper systematically evaluates and enhances sentence embedding extraction methods from pretrained transformer models, demonstrating significant performance improvements across various NLP tasks, especially for static and random token-based models.

Contribution

It introduces novel techniques for extracting and refining sentence embeddings from pretrained transformers, improving their effectiveness across multiple NLP benchmarks.

Findings

01

Representation-shaping techniques significantly improve embeddings.

02

Static token-based models, including random embeddings, reach near BERT performance.

03

Enhanced methods outperform existing approaches on STS and clustering tasks.

Abstract

Pre-trained transformer models shine in many natural language processing tasks and therefore are expected to bear the representation of the input sentence or text meaning. These sentence-level embeddings are also important in retrieval-augmented generation. But do commonly used plain averaging or prompt templates sufficiently capture and represent the underlying meaning? After providing a comprehensive review of existing sentence embedding extraction and refinement methods, we thoroughly test different combinations and our original extensions of the most promising ones on pretrained models. Namely, given 110 M parameters, BERT's hidden representations from multiple layers, and many tokens, we try diverse ways to extract optimal sentence embeddings. We test various token aggregation and representation post-processing techniques. We also test multiple ways of using a general Wikitext…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling