Fine-grained Classification of A Million Life Trajectories from Wikipedia
Zhaoyang Liu, Xiaocong Du, Yixi Zhou, Ye Shi, Haipeng Zhang

TL;DR
This paper introduces a large-scale method for classifying detailed life activities of notable individuals from Wikipedia, using syntactic graphs and large language models to improve accuracy and create a comprehensive dataset for human dynamics research.
Contribution
The study presents a novel approach combining syntactic graphs and LLMs to classify fine-grained life activities from Wikipedia, resulting in the largest dataset of its kind.
Findings
Achieved 84.5% classification accuracy
Constructed a dataset with 3.8 million labeled activities
Surpassed baseline methods in activity classification
Abstract
Life trajectories of notable people convey essential messages for human dynamics research. These trajectories consist of (\textit{person, time, location, activity type}) tuples recording when and where a person was born, went to school, started a job, or fought in a war. However, current studies only cover limited activity types such as births and deaths, lacking large-scale fine-grained trajectories. Using a tool that extracts (\textit{person, time, location}) triples from Wikipedia, we formulate the problem of classifying these triples into 24 carefully-defined types using textual context as complementary information. The challenge is that triple entities are often scattered in noisy contexts. We use syntactic graphs to bring triple entities and relevant information closer, fusing them with text embeddings to classify life trajectory activities. Since Wikipedia text quality varies, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Mobility and Location-Based Analysis · Topic Modeling · Complex Network Analysis Techniques
