Remembering Unequally: Global and Disciplinary Bias in LLM Reconstruction of Scholarly Coauthor Lists
Ghazal Kalhor, Afra Mashhadi

TL;DR
This paper examines how large language models tend to favor well-known researchers and disciplines when reconstructing scholarly coauthor lists, revealing biases linked to citation counts and regional disparities.
Contribution
It provides an empirical analysis of memorization biases in LLMs regarding scholarly coauthorship, highlighting disparities across disciplines and regions.
Findings
Highly cited researchers are disproportionately favored in LLM reconstructions.
Discipline-specific differences affect the fairness of coauthor list reconstruction.
Some regions, like parts of Africa, show more balanced outcomes.
Abstract
Ongoing breakthroughs in large language models (LLMs) are reshaping scholarly search and discovery interfaces. While these systems offer new possibilities for navigating scientific knowledge, they also raise concerns about fairness and representational bias rooted in the models' memorized training data. As LLMs are increasingly used to answer queries about researchers and research communities, their ability to accurately reconstruct scholarly coauthor lists becomes an important but underexamined issue. In this study, we investigate how memorization in LLMs affects the reconstruction of coauthor lists and whether this process reflects existing inequalities across academic disciplines and world regions. We evaluate three prominent models, DeepSeek R1, Llama 4 Scout, and Mixtral 8x7B, by comparing their generated coauthor lists against bibliographic reference data. Our analysis reveals a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
