OntoSeg: a Novel Approach to Text Segmentation using Ontological Similarity
Mostafa Bayomi, Killian Levacher, M. Rami Ghorab, S\'eamus Lawless

TL;DR
OntoSeg introduces an ontological similarity-based method for text segmentation, leveraging conceptual relations and hierarchical clustering to improve segmentation quality over traditional lexical approaches.
Contribution
The paper presents a novel ontological similarity measure for text segmentation, integrating it with hierarchical clustering to capture conceptual relations and enable multi-granularity segmentation.
Findings
OntoSeg outperforms traditional lexical cohesion methods on standard datasets.
Combining ontological and lexical similarities enhances segmentation accuracy.
The hierarchical structure allows flexible segmentation at different levels of detail.
Abstract
Text segmentation (TS) aims at dividing long text into coherent segments which reflect the subtopic structure of the text. It is beneficial to many natural language processing tasks, such as Information Retrieval (IR) and document summarisation. Current approaches to text segmentation are similar in that they all use word-frequency metrics to measure the similarity between two regions of text, so that a document is segmented based on the lexical cohesion between its words. Various NLP tasks are now moving towards the semantic web and ontologies, such as ontology-based IR systems, to capture the conceptualizations associated with user needs and contents. Text segmentation based on lexical cohesion between words is hence not sufficient anymore for such tasks. This paper proposes OntoSeg, a novel approach to text segmentation based on the ontological similarity between text blocks. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
