Segmentation of Expository Texts by Hierarchical Agglomerative Clustering
Yaakov Yaari (Bar Ilan University)

TL;DR
This paper introduces a hierarchical agglomerative clustering method for segmenting expository texts, leveraging lexical similarity to identify discourse structure and enabling both linear and hierarchical exploration.
Contribution
It presents a novel hierarchical clustering approach for text segmentation that outperforms traditional linear methods in capturing discourse structure.
Findings
Comparable results to existing linear segmentation methods
Effective hierarchical discourse structure identification
Supports both linear and hierarchical text exploration
Abstract
We propose a method for segmentation of expository texts based on hierarchical agglomerative clustering. The method uses paragraphs as the basic segments for identifying hierarchical discourse structure in the text, applying lexical similarity between them as the proximity test. Linear segmentation can be induced from the identified structure through application of two simple rules. However the hierarchy can be used also for intelligent exploration of the text. The proposed segmentation algorithm is evaluated against an accepted linear segmentation method and shows comparable results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
