LLM-based Embeddings: Attention Values Encode Sentence Semantics Better Than Hidden States
Yeqin Zhang, Yunfei Wang, Jiaxuan Chen, Ke Qin, Yizheng Zhao, Cam-Tu Nguyen

TL;DR
This paper shows that attention value vectors in LLMs better capture sentence semantics than hidden states, introducing a simple, training-free method called Value Aggregation that outperforms existing embeddings.
Contribution
It introduces Value Aggregation (VA) and AlignedWVA, novel methods leveraging attention values for improved sentence embeddings without training, surpassing prior approaches.
Findings
VA outperforms other LLM-based embeddings in a training-free setting
AlignedWVA achieves state-of-the-art results among training-free embeddings
Fine-tuning VA can further enhance LLM embedding quality
Abstract
Sentence representations are foundational to many Natural Language Processing (NLP) applications. While recent methods leverage Large Language Models (LLMs) to derive sentence representations, most rely on final-layer hidden states, which are optimized for next-token prediction and thus often fail to capture global, sentence-level semantics. This paper introduces a novel perspective, demonstrating that attention value vectors capture sentence semantics more effectively than hidden states. We propose Value Aggregation (VA), a simple method that pools token values across multiple layers and token indices. In a training-free setting, VA outperforms other LLM-based embeddings, even matches or surpasses the ensemble-based MetaEOL. Furthermore, we demonstrate that when paired with suitable prompts, the layer attention outputs can be interpreted as aligned weighted value vectors. Specifically,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Computational and Text Analysis Methods
