Unsupervised Extractive Dialogue Summarization in Hyperdimensional Space

Seongmin Park; Kyungho Kim; Jaejin Seo; Jihwa Lee

arXiv:2405.09765·cs.CL·May 17, 2024

Unsupervised Extractive Dialogue Summarization in Hyperdimensional Space

Seongmin Park, Kyungho Kim, Jaejin Seo, Jihwa Lee

PDF

2 Repos

TL;DR

HyperSum is an unsupervised extractive dialogue summarization method that leverages high-dimensional vector embeddings and clustering to produce accurate, faithful summaries efficiently, outperforming many existing methods.

Contribution

It introduces a novel high-dimensional embedding approach using pseudo-orthogonality for unsupervised extractive summarization, combining efficiency and accuracy.

Findings

01

HyperSum outperforms state-of-the-art summarizers in accuracy and faithfulness.

02

HyperSum is 10 to 100 times faster than comparable methods.

03

Open-sourced HyperSum as a strong baseline for unsupervised extractive summarization.

Abstract

We present HyperSum, an extractive summarization framework that captures both the efficiency of traditional lexical summarization and the accuracy of contemporary neural approaches. HyperSum exploits the pseudo-orthogonality that emerges when randomly initializing vectors at extremely high dimensions ("blessing of dimensionality") to construct representative and efficient sentence embeddings. Simply clustering the obtained embeddings and extracting their medoids yields competitive summaries. HyperSum often outperforms state-of-the-art summarizers -- in terms of both summary accuracy and faithfulness -- while being 10 to 100 times faster. We open-source HyperSum as a strong baseline for unsupervised extractive summarization.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.