Evaluation of LLMs on Long-tail Entity Linking in Historical Documents

Marta Boscariol; Luana Bulla; Lia Draetta; Beatrice Fiuman\`o,; Emanuele Lenzi; Leonardo Piano

arXiv:2505.03473·cs.CL·May 7, 2025

Evaluation of LLMs on Long-tail Entity Linking in Historical Documents

Marta Boscariol, Luana Bulla, Lia Draetta, Beatrice Fiuman\`o,, Emanuele Lenzi, Leonardo Piano

PDF

Open Access

TL;DR

This paper evaluates the effectiveness of GPT and LLama3 LLMs in linking rare, long-tail entities in historical texts, demonstrating promising results that suggest LLMs can enhance long-tail entity linking performance.

Contribution

It provides the first systematic assessment of LLMs on long-tail entity linking in historical documents, comparing their performance with a state-of-the-art EL framework.

Findings

01

LLMs perform well in long-tail EL tasks

02

LLMs can complement traditional EL methods

03

Preliminary results show promising potential of LLMs for long-tail entity linking

Abstract

Entity Linking (EL) plays a crucial role in Natural Language Processing (NLP) applications, enabling the disambiguation of entity mentions by linking them to their corresponding entries in a reference knowledge base (KB). Thanks to their deep contextual understanding capabilities, LLMs offer a new perspective to tackle EL, promising better results than traditional methods. Despite the impressive generalization capabilities of LLMs, linking less popular, long-tail entities remains challenging as these entities are often underrepresented in training data and knowledge bases. Furthermore, the long-tail EL task is an understudied problem, and limited studies address it with LLMs. In the present work, we assess the performance of two popular LLMs, GPT and LLama3, in a long-tail entity linking scenario. Using MHERCL v0.1, a manually annotated benchmark of sentences from domain-specific…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Data Quality and Management

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Byte Pair Encoding · Attention Dropout · Softmax · Residual Connection · Linear Layer · Multi-Head Attention · Dense Connections · Discriminative Fine-Tuning · Adam