Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization
Regina Barzilay, Lillian Lee

TL;DR
This paper introduces a probabilistic content modeling approach for texts that captures topic sequences, improving tasks like information ordering and summarization through a novel Hidden Markov Model adaptation.
Contribution
It presents a new knowledge-lean method for learning content models from unannotated texts, enhancing text ordering and summarization performance.
Findings
Significant improvement over previous methods in information ordering.
Effective content modeling from unannotated documents.
Enhanced extractive summarization results.
Abstract
We consider the problem of modeling the content structure of texts within a specific domain, in terms of the topics the texts address and the order in which these topics appear. We first present an effective knowledge-lean method for learning content models from un-annotated documents, utilizing a novel adaptation of algorithms for Hidden Markov Models. We then apply our method to two complementary tasks: information ordering and extractive summarization. Our experiments show that incorporating content models in these applications yields substantial improvement over previously-proposed methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
