A Noisy-Channel Model for Document Compression
Hal Daum\'e III, Daniel Marcu

TL;DR
This paper introduces a hierarchical noisy-channel model for document compression that leverages syntactic and discourse structures to produce coherent, grammatical summaries, outperforming simpler baseline methods.
Contribution
It presents a novel hierarchical model that incorporates discourse and syntactic structures for improved document compression.
Findings
The system outperforms baseline and sentence-based compression methods.
Discourse knowledge significantly improves document summarization quality.
Hierarchical modeling captures essential information for coherent compression.
Abstract
We present a document compression system that uses a hierarchical noisy-channel model of text production. Our compression system first automatically derives the syntactic structure of each sentence and the overall discourse structure of the text given as input. The system then uses a statistical hierarchical model of text production in order to drop non-important syntactic and discourse constituents so as to generate coherent, grammatical document compressions of arbitrary length. The system outperforms both a baseline and a sentence-based compression system that operates by simplifying sequentially all sentences in a text. Our results support the claim that discourse knowledge plays an important role in document summarization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Algorithms and Data Compression
