Entropy and Graph Based Modelling of Document Coherence using Discourse Entities: An Application
Casper Petersen, Christina Lioma, Jakob Grue Simonsen, Birger, Larsen

TL;DR
This paper introduces two innovative models for assessing document coherence based on discourse entities, utilizing entropy and graph metrics, and demonstrates their effectiveness in improving information retrieval performance.
Contribution
The paper presents two new models of document coherence using discourse entities, applying entropy and graph topology metrics, and shows their benefit in IR reranking.
Findings
Models perform comparably to existing coherence models without tuning
Reranking by coherence scores improves IR results
Coherence correlates with document relevance
Abstract
We present two novel models of document coherence and their application to information retrieval (IR). Both models approximate document coherence using discourse entities, e.g. the subject or object of a sentence. Our first model views text as a Markov process generating sequences of discourse entities (entity n-grams); we use the entropy of these entity n-grams to approximate the rate at which new information appears in text, reasoning that as more new words appear, the topic increasingly drifts and text coherence decreases. Our second model extends the work of Guinaudeau & Strube [28] that represents text as a graph of discourse entities, linked by different relations, such as their distance or adjacency in text. We use several graph topology metrics to approximate different aspects of the discourse flow that can indicate coherence, such as the average clustering or betweenness of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
