An Automatic Method of Finding Topic Boundaries
Jeffrey C. Reynar (University of Pennsylvania)

TL;DR
This paper introduces an automatic method for identifying discourse boundaries using lexical cohesion and dotplotting, with experiments demonstrating its effectiveness in segmenting concatenated documents.
Contribution
It presents a novel combination of lexical cohesion analysis and graphical dotplotting for automatic discourse boundary detection, including an optimization algorithm.
Findings
Successful automatic boundary detection in concatenated texts
Effective use of dotplotting for discourse segmentation
Potential applications in text analysis and processing
Abstract
This article outlines a new method of locating discourse boundaries based on lexical cohesion and a graphical technique called dotplotting. The application of dotplotting to discourse segmentation can be performed either manually, by examining a graph, or automatically, using an optimization algorithm. The results of two experiments involving automatically locating boundaries between a series of concatenated documents are presented. Areas of application and future directions for this work are also outlined.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications
