Extracting Sentence Embeddings from Pretrained Transformer Models
Lukas Stankevi\v{c}ius, Mantas Luko\v{s}evi\v{c}ius

TL;DR
This paper systematically evaluates and enhances sentence embedding extraction methods from pretrained transformer models, demonstrating significant performance improvements across various NLP tasks, especially for static and random token-based models.
Contribution
It introduces novel techniques for extracting and refining sentence embeddings from pretrained transformers, improving their effectiveness across multiple NLP benchmarks.
Findings
Representation-shaping techniques significantly improve embeddings.
Static token-based models, including random embeddings, reach near BERT performance.
Enhanced methods outperform existing approaches on STS and clustering tasks.
Abstract
Pre-trained transformer models shine in many natural language processing tasks and therefore are expected to bear the representation of the input sentence or text meaning. These sentence-level embeddings are also important in retrieval-augmented generation. But do commonly used plain averaging or prompt templates sufficiently capture and represent the underlying meaning? After providing a comprehensive review of existing sentence embedding extraction and refinement methods, we thoroughly test different combinations and our original extensions of the most promising ones on pretrained models. Namely, given 110 M parameters, BERT's hidden representations from multiple layers, and many tokens, we try diverse ways to extract optimal sentence embeddings. We test various token aggregation and representation post-processing techniques. We also test multiple ways of using a general Wikitext…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
