A Rate-Distortion Framework for Summarization
Enes Arda, Aylin Yener

TL;DR
This paper presents an information-theoretic framework for text summarization, defining a fundamental performance bound and proposing practical algorithms to compute it, validated through empirical comparisons with real summarizers.
Contribution
It introduces a novel rate-distortion framework for summarization, providing a theoretical performance bound and practical computation methods.
Findings
The rate-distortion function sets a lower bound on summarizer performance.
Practical algorithms can estimate the rate-distortion function from limited data.
Empirical results align with theoretical predictions of the framework.
Abstract
This paper introduces an information-theoretic framework for text summarization. We define the summarizer rate-distortion function and show that it provides a fundamental lower bound on summarizer performance. We describe an iterative procedure, similar to Blahut-Arimoto algorithm, for computing this function. To handle real-world text datasets, we also propose a practical method that can calculate the summarizer rate-distortion function with limited data. Finally, we empirically confirm our theoretical results by comparing the summarizer rate-distortion function with the performances of different summarizers used in practice.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Algorithms and Data Compression
