A Machine Learning Approach to Quantitative Prosopography
Aayushee Gupta, Haimonti Dutta, Srikanta Bedathur, Lipika Dey

TL;DR
This paper introduces a machine learning framework that automatically constructs a people gazetteer from noisy newspaper texts, enabling quantitative prosopographical analysis of influential individuals in historical research.
Contribution
It presents a novel machine learning approach for automatically creating a people gazetteer from noisy textual data, facilitating quantitative prosopography.
Findings
Successfully identified influential historical figures from newspaper data
Developed a custom Influential Person Index (IPI) for ranking individuals
Analyzed 14,020 articles from 1896 New York newspaper
Abstract
Prosopography is an investigation of the common characteristics of a group of people in history, by a collective study of their lives. It involves a study of biographies to solve historical problems. If such biographies are unavailable, surviving documents and secondary biographical data are used. Quantitative prosopography involves analysis of information from a wide variety of sources about "ordinary people". In this paper, we present a machine learning framework for automatically designing a people gazetteer which forms the basis of quantitative prosopographical research. The gazetteer is learnt from the noisy text of newspapers using a Named Entity Recognizer (NER). It is capable of identifying influential people from it by making use of a custom designed Influential Person Index (IPI). Our corpus comprises of 14020 articles from a local newspaper, "The Sun", published from New York…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Topic Modeling · Digital Humanities and Scholarship
