LEAD: LLM-enhanced Engine for Author Disambiguation
Giusy Giulia Tuccari, Lorenzo Giammei, Andrea Giovanni Nuzzolese, Misael Mongiov\`i, Antonio Zinilli, Francesco Poggi

TL;DR
This paper introduces LEAD, a hybrid framework combining semantic features from Large Language Models with structural network evidence to improve author disambiguation across bibliographic databases, achieving high accuracy and efficiency.
Contribution
The study presents LEAD, a novel hybrid approach that effectively integrates LLM-derived semantic features with network-based structural evidence for author disambiguation.
Findings
LEAD achieves F1 = 96.7% and accuracy = 95.7%.
Bibliographic Coupling is the fastest and strongest single-source method.
Hybrid strategies outperform individual methods in author disambiguation.
Abstract
Author Name Disambiguation (AND) is a long-standing challenge in bibliometrics and scientometrics, as name ambiguity undermines the accuracy of bibliographic databases and the reliability of research evaluation. This study addresses the problem of cross-source disambiguation by linking academic career records from CercaUniversit\`a, the official registry of Italian academics, with author profiles in Scopus. We introduce LEAD (LLM-enhanced Engine for Author Disambiguation), a novel hybrid framework that combines semantic features extracted through Large Language Models (LLMs) with structural evidence derived from co-authorship and citation networks. Using a gold standard of 606 ambiguous cases, we compare five methods: (i) Label Spreading on co-authorship networks; (ii) Bibliographic Coupling on citation networks; (iii) a standalone LLM-based approach; (iv) an LLM-enriched configuration;…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Authorship Attribution and Profiling · Topic Modeling
