Clustering Prominent People and Organizations in Topic-Specific Text Corpora
Abdulkareem Alsudais, Hovig Tchalian

TL;DR
This paper presents a novel method for clustering prominent people and organizations in topic-specific texts using named entity recognition and word embeddings, demonstrating effective semantic clustering through human and quantitative evaluation.
Contribution
The paper introduces a new clustering approach that combines named entity recognition with word embeddings to improve semantic grouping of entities in text corpora.
Findings
The method effectively clusters semantically similar entities.
Human judges rated the clustering quality positively.
Quantitative metrics confirmed the method's effectiveness.
Abstract
Named entities in text documents are the names of people, organization, location or other types of objects in the documents that exist in the real world. A persisting research challenge is to use computational techniques to identify such entities in text documents. Once identified, several text mining tools and algorithms can be utilized to leverage these discovered named entities and improve NLP applications. In this paper, a method that clusters prominent names of people and organizations based on their semantic similarity in a text corpus is proposed. The method relies on common named entity recognition techniques and on recent word embeddings models. The semantic similarity scores generated using the word embeddings models for the named entities are used to cluster similar entities of the people and organizations types. Two human judges evaluated ten variations of the method after…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
