Unsupervised Named Entity Disambiguation for Low Resource Domains

Debarghya Datta; Soumajit Pramanik

arXiv:2412.10054·cs.CL·December 16, 2024

Unsupervised Named Entity Disambiguation for Low Resource Domains

Debarghya Datta, Soumajit Pramanik

PDF

1 Repo 1 Video

TL;DR

This paper introduces an unsupervised method for named entity disambiguation tailored for low-resource, domain-specific texts, outperforming existing approaches by over 40% in precision.

Contribution

The authors propose a novel unsupervised approach using Group Steiner Trees to improve entity linking in low-resource, domain-specific scenarios without relying on training data.

Findings

01

Achieved over 40% improvement in Precision@1 compared to state-of-the-art methods.

02

Effectively handles noisy texts and domain-specific knowledge bases.

03

Applicable across various specialized domains with limited resources.

Abstract

In the ever-evolving landscape of natural language processing and information retrieval, the need for robust and domain-specific entity linking algorithms has become increasingly apparent. It is crucial in a considerable number of fields such as humanities, technical writing and biomedical sciences to enrich texts with semantics and discover more knowledge. The use of Named Entity Disambiguation (NED) in such domains requires handling noisy texts, low resource settings and domain-specific KBs. Existing approaches are mostly inappropriate for such scenarios, as they either depend on training data or are not flexible enough to work with domain-specific KBs. Thus in this work, we present an unsupervised approach leveraging the concept of Group Steiner Trees (GST), which can identify the most relevant candidates for entity disambiguation using the contextual similarities across candidate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

deba-iitbh/gst-ned
noneOfficial

Videos

Unsupervised Named Entity Disambiguation for Low Resource Domains· underline