Vec2Summ: Text Summarization via Probabilistic Sentence Embeddings

Mao Li; Fred Conrad; Johann Gagnon-Bartsch

arXiv:2508.07017·cs.CL·August 12, 2025

Vec2Summ: Text Summarization via Probabilistic Sentence Embeddings

Mao Li, Fred Conrad, Johann Gagnon-Bartsch

PDF

Open Access

TL;DR

Vec2Summ introduces a semantic compression-based abstractive summarization method that encodes a document collection into a single vector, enabling scalable, controllable, and interpretable summaries without context-length constraints.

Contribution

The paper presents Vec2Summ, a novel summarization approach using probabilistic sentence embeddings and embedding inversion, addressing limitations of LLM-based methods in scalability and interpretability.

Findings

01

Produces coherent, topically focused summaries

02

Scales efficiently with corpus size

03

Achieves performance comparable to LLM summarization

Abstract

We propose Vec2Summ, a novel method for abstractive summarization that frames the task as semantic compression. Vec2Summ represents a document collection using a single mean vector in the semantic embedding space, capturing the central meaning of the corpus. To reconstruct fluent summaries, we perform embedding inversion -- decoding this mean vector into natural language using a generative language model. To improve reconstruction quality and capture some degree of topical variability, we introduce stochasticity by sampling from a Gaussian distribution centered on the mean. This approach is loosely analogous to bagging in ensemble learning, where controlled randomness encourages more robust and varied outputs. Vec2Summ addresses key limitations of LLM-based summarization methods. It avoids context-length constraints, enables interpretable and controllable generation via semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Healthcare · Natural Language Processing Techniques