# An information theoretic model for summarization, and some basic results

**Authors:** Eric Graves, Qiang Ning, Prithwish Basu

arXiv: 1901.06376 · 2019-01-21

## TL;DR

This paper introduces an information theoretic framework for summarization, modeling reports as binary object sets and optimizing subset selection to minimize semantic loss, with results for known and unknown report distributions.

## Contribution

It formulates a novel information theoretic model for summarization and derives methods to minimize semantic loss under different distribution knowledge scenarios.

## Key findings

- Optimal summarizers minimize semantic loss when distribution is known.
- Summarizers approximate minimal semantic loss when distribution is unknown.
- The model provides a theoretical basis for evaluating summarization quality.

## Abstract

A basic information theoretic model for summarization is formulated. Here summarization is considered as the process of taking a report of $v$ binary objects, and producing from it a $j$ element subset that captures most of the important features of the original report, with importance being defined via an arbitrary set function endemic to the model. The loss of information is then measured by a weight average of variational distances, which we term the semantic loss.   Our results include both cases where the probability distribution generating the $v$-length reports are known and unknown. In the case where it is known, our results demonstrate how to construct summarizers which minimize the semantic loss. For the case where the probability distribution is unknown, we show how to construct summarizers whose semantic loss when averaged uniformly over all possible distribution converges to the minimum.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.06376/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1901.06376/full.md

## References

8 references — full list in the complete paper: https://tomesphere.com/paper/1901.06376/full.md

---
Source: https://tomesphere.com/paper/1901.06376