Paths of A Million People: Extracting Life Trajectories from Wikipedia
Ying Zhang, Xiaofeng Li, Zhaoyang Liu, Haipeng Zhang

TL;DR
This paper presents a method to extract and analyze life trajectories of notable individuals from Wikipedia, creating a large dataset and demonstrating its validity through empirical analysis, thereby advancing research in human dynamics.
Contribution
The authors develop COSMOS, a semi-supervised and contrastive learning ensemble model, to extract life trajectories from Wikipedia at scale, and provide a new dataset and empirical validation.
Findings
Achieved an F1 score of 85.95% in trajectory extraction.
Created the WikiLifeTrajectory dataset with 8,852 triplets.
Performed empirical analysis on historians' trajectories.
Abstract
The life trajectories of notable people have been studied to pinpoint the times and places of significant events such as birth, death, education, marriage, competition, work, speeches, scientific discoveries, artistic achievements, and battles. Understanding how these individuals interact with others provides valuable insights for broader research into human dynamics. However, the scarcity of trajectory data in terms of volume, density, and inter-person interactions, limits relevant studies from being comprehensive and interactive. We mine millions of biography pages from Wikipedia and tackle the generalization problem stemming from the variety and heterogeneity of the trajectory descriptions. Our ensemble model COSMOS, which combines the idea of semi-supervised learning and contrastive learning, achieves an F1 score of 85.95%. For this task, we also create a hand-curated dataset,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWikis in Education and Collaboration · Multimodal Machine Learning Applications · Natural Language Processing Techniques
MethodsFocus
