A Rate-Distortion Framework for Summarization

Enes Arda; Aylin Yener

arXiv:2501.13100·cs.IT·May 5, 2025

A Rate-Distortion Framework for Summarization

Enes Arda, Aylin Yener

PDF

Open Access

TL;DR

This paper presents an information-theoretic framework for text summarization, defining a fundamental performance bound and proposing practical algorithms to compute it, validated through empirical comparisons with real summarizers.

Contribution

It introduces a novel rate-distortion framework for summarization, providing a theoretical performance bound and practical computation methods.

Findings

01

The rate-distortion function sets a lower bound on summarizer performance.

02

Practical algorithms can estimate the rate-distortion function from limited data.

03

Empirical results align with theoretical predictions of the framework.

Abstract

This paper introduces an information-theoretic framework for text summarization. We define the summarizer rate-distortion function and show that it provides a fundamental lower bound on summarizer performance. We describe an iterative procedure, similar to Blahut-Arimoto algorithm, for computing this function. To handle real-world text datasets, we also propose a practical method that can calculate the summarizer rate-distortion function with limited data. Finally, we empirically confirm our theoretical results by comparing the summarizer rate-distortion function with the performances of different summarizers used in practice.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Algorithms and Data Compression