Text Segmentation using Named Entity Recognition and Co-reference Resolution in English and Greek Texts
Pavlina Fragkou

TL;DR
This study investigates how combining named entity recognition and co-reference resolution can improve text segmentation in English and Greek texts, revealing that effectiveness varies with topic, entity frequency, and segment length.
Contribution
It demonstrates the impact of integrating NER and co-reference resolution on text segmentation performance across two languages, with detailed analysis of influencing factors.
Findings
Performance depends on segment topic and length
Higher entity frequency improves segmentation accuracy
Language-specific differences affect results
Abstract
In this paper we examine the benefit of performing named entity recognition (NER) and co-reference resolution to an English and a Greek corpus used for text segmentation. The aim here is to examine whether the combination of text segmentation and information extraction can be beneficial for the identification of the various topics that appear in a document. NER was performed manually in the English corpus and was compared with the output produced by publicly available annotation tools while, an already existing tool was used for the Greek corpus. Produced annotations from both corpora were manually corrected and enriched to cover four types of named entities. Co-reference resolution i.e., substitution of every reference of the same instance with the same named entity identifier was subsequently performed. The evaluation, using five text segmentation algorithms for the English corpus and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies
