# Web-Scale Academic Name Disambiguation: the WhoIsWho Benchmark,   Leaderboard, and Toolkit

**Authors:** Bo Chen, Jing Zhang, Fanjin Zhang, Tianyi Han, Yuqing Cheng, Xiaoyan, Li, Yuxiao Dong, and Jie Tang

arXiv: 2302.11848 · 2023-06-07

## TL;DR

This paper introduces WhoIsWho, a large-scale, high-quality benchmark and toolkit for academic name disambiguation, addressing real-world challenges with extensive data, comprehensive tasks, and strong baseline models.

## Contribution

It provides the first large-scale, real-world benchmark dataset, a comprehensive evaluation leaderboard, and an integrated toolkit for academic name disambiguation tasks.

## Key findings

- Developed a benchmark with over 1 million papers.
- Deployed a strong baseline in the AMiner system.
- Enabled daily arXiv paper assignments online.

## Abstract

Name disambiguation -- a fundamental problem in online academic systems -- is now facing greater challenges with the increasing growth of research papers. For example, on AMiner, an online academic search platform, about 10% of names own more than 100 authors. Such real-world challenging cases have not been effectively addressed by existing researches due to the small-scale or low-quality datasets that they have used. The development of effective algorithms is further hampered by a variety of tasks and evaluation protocols designed on top of diverse datasets. To this end, we present WhoIsWho owning, a large-scale benchmark with over 1,000,000 papers built using an interactive annotation process, a regular leaderboard with comprehensive tasks, and an easy-to-use toolkit encapsulating the entire pipeline as well as the most powerful features and baseline models for tackling the tasks. Our developed strong baseline has already been deployed online in the AMiner system to enable daily arXiv paper assignments. The public leaderboard is available at http://whoiswho.biendata.xyz/. The toolkit is at https://github.com/THUDM/WhoIsWho. The online demo of daily arXiv paper assignments is at https://na-demo.aminer.cn/arxivpaper.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.11848/full.md

## Figures

24 figures with captions in the complete paper: https://tomesphere.com/paper/2302.11848/full.md

## References

50 references — full list in the complete paper: https://tomesphere.com/paper/2302.11848/full.md

---
Source: https://tomesphere.com/paper/2302.11848