Clustering Prominent People and Organizations in Topic-Specific Text   Corpora

Abdulkareem Alsudais; Hovig Tchalian

arXiv:1807.10800·cs.CL·July 9, 2019

Clustering Prominent People and Organizations in Topic-Specific Text Corpora

Abdulkareem Alsudais, Hovig Tchalian

PDF

Open Access

TL;DR

This paper presents a novel method for clustering prominent people and organizations in topic-specific texts using named entity recognition and word embeddings, demonstrating effective semantic clustering through human and quantitative evaluation.

Contribution

The paper introduces a new clustering approach that combines named entity recognition with word embeddings to improve semantic grouping of entities in text corpora.

Findings

01

The method effectively clusters semantically similar entities.

02

Human judges rated the clustering quality positively.

03

Quantitative metrics confirmed the method's effectiveness.

Abstract

Named entities in text documents are the names of people, organization, location or other types of objects in the documents that exist in the real world. A persisting research challenge is to use computational techniques to identify such entities in text documents. Once identified, several text mining tools and algorithms can be utilized to leverage these discovered named entities and improve NLP applications. In this paper, a method that clusters prominent names of people and organizations based on their semantic similarity in a text corpus is proposed. The method relies on common named entity recognition techniques and on recent word embeddings models. The semantic similarity scores generated using the word embeddings models for the named entities are used to cluster similar entities of the people and organizations types. Two human judges evaluated ten variations of the method after…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques