# Word Sense Disambiguation with Wikipedia Entities: A Survey of Entity Linking Approaches

**Authors:** Michael Angelos Simos, Christos Makris

PMC · DOI: 10.3390/e28020236 · Entropy · 2026-02-18

## TL;DR

This paper reviews how Wikipedia is used to resolve ambiguous words in text by linking them to specific entities, covering various methods and challenges.

## Contribution

The paper provides a comprehensive survey of entity linking approaches using Wikipedia, highlighting recent advancements and open challenges.

## Key findings

- Wikipedia is a widely used knowledge source for resolving word ambiguity in NLP due to its completeness and multilingual support.
- The survey covers a range of methodologies from early hyperlink-based systems to modern neural and contextual approaches.
- Challenges like partial coverage and NIL concepts remain significant in entity linking tasks.

## Abstract

The inference of unstructured text semantics is a crucial preprocessing task for NLP and AI applications. Word sense disambiguation and entity linking tasks resolve ambiguous terms within unstructured text corpora to senses from a predefined knowledge source. Wikipedia has been one of the most popular sources due to its completeness, high link density, and multi-language support. In the context of chatbot-mediated consumption of information in recent years through implicit disambiguation and semantic representations in LLMs, Wikipedia remains an invaluable source and reference point. This survey covers methodologies for entity linking with Wikipedia, including early systems based on hyperlink statistics and semantic relatedness, methods using graph inference problem formalizations and graph label propagation algorithms, neural and contextual methods based on sense embeddings and transformers, and multimodal, cross-lingual, and cross-domain settings. Moreover, we cover semantic annotation workflows that facilitate the scaled-up use of Wikipedia-centric entity linking. We also provide an overview of the available datasets and evaluation measures. We discuss challenges such as partial coverage, NIL concepts, the level of sense definition, combining WSD and large-scale language models, as well as the complementary use of Wikidata.

## Full-text entities

- **Diseases:** injury to (MESH:D014947), LLMs (MESH:D007806), WSD (MESH:D001037), EL (MESH:C536424)
- **Chemicals:** ESA (-)
- **Species:** Malus domestica (apple, species) [taxon 3750], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12939010/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12939010/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/PMC12939010/full.md

---
Source: https://tomesphere.com/paper/PMC12939010