Catching the Drift: Probabilistic Content Models, with Applications to   Generation and Summarization

Regina Barzilay; Lillian Lee

arXiv:cs/0405039·cs.CL·May 23, 2007·299 cites

Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization

Regina Barzilay, Lillian Lee

PDF

Open Access

TL;DR

This paper introduces a probabilistic content modeling approach for texts that captures topic sequences, improving tasks like information ordering and summarization through a novel Hidden Markov Model adaptation.

Contribution

It presents a new knowledge-lean method for learning content models from unannotated texts, enhancing text ordering and summarization performance.

Findings

01

Significant improvement over previous methods in information ordering.

02

Effective content modeling from unannotated documents.

03

Enhanced extractive summarization results.

Abstract

We consider the problem of modeling the content structure of texts within a specific domain, in terms of the topics the texts address and the order in which these topics appear. We first present an effective knowledge-lean method for learning content models from un-annotated documents, utilizing a novel adaptation of algorithms for Hidden Markov Models. We then apply our method to two complementary tasks: information ordering and extractive summarization. Our experiments show that incorporating content models in these applications yields substantial improvement over previously-proposed methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques