Vec2Summ: Text Summarization via Probabilistic Sentence Embeddings
Mao Li, Fred Conrad, Johann Gagnon-Bartsch

TL;DR
Vec2Summ introduces a semantic compression-based abstractive summarization method that encodes a document collection into a single vector, enabling scalable, controllable, and interpretable summaries without context-length constraints.
Contribution
The paper presents Vec2Summ, a novel summarization approach using probabilistic sentence embeddings and embedding inversion, addressing limitations of LLM-based methods in scalability and interpretability.
Findings
Produces coherent, topically focused summaries
Scales efficiently with corpus size
Achieves performance comparable to LLM summarization
Abstract
We propose Vec2Summ, a novel method for abstractive summarization that frames the task as semantic compression. Vec2Summ represents a document collection using a single mean vector in the semantic embedding space, capturing the central meaning of the corpus. To reconstruct fluent summaries, we perform embedding inversion -- decoding this mean vector into natural language using a generative language model. To improve reconstruction quality and capture some degree of topical variability, we introduce stochasticity by sampling from a Gaussian distribution centered on the mean. This approach is loosely analogous to bagging in ensemble learning, where controlled randomness encourages more robust and varied outputs. Vec2Summ addresses key limitations of LLM-based summarization methods. It avoids context-length constraints, enables interpretable and controllable generation via semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning in Healthcare · Natural Language Processing Techniques
