Quantitative Analysis of Genealogy Using Digitised Family Trees
Michael Fire, Thomas Chesney, and Yuval Elovici

TL;DR
This paper analyzes digitized family trees from online platforms to study population dynamics and genealogy, leveraging large-scale data and machine learning to enhance understanding of human ancestry and social trends.
Contribution
It introduces a comprehensive methodology for mining and verifying large genealogical datasets from online family trees, enabling detailed population analysis.
Findings
Insights into population sex ratios and marriage trends
Analysis of fertility, lifespan, and twin/triplet frequencies
Validation of genealogical data against census sources
Abstract
Driven by the popularity of television shows such as Who Do You Think You Are? many millions of users have uploaded their family tree to web projects such as WikiTree. Analysis of this corpus enables us to investigate genealogy computationally. The study of heritage in the social sciences has led to an increased understanding of ancestry and descent but such efforts are hampered by difficult to access data. Genealogical research is typically a tedious process involving trawling through sources such as birth and death certificates, wills, letters and land deeds. Decades of research have developed and examined hypotheses on population sex ratios, marriage trends, fertility, lifespan, and the frequency of twins and triplets. These can now be tested on vast datasets containing many billions of entries using machine learning tools. Here we survey the use of genealogy data mining using family…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression
