Measuring Social Integration Through Participation: Categorizing Organizations and Leisure Activities in the Displaced Karelians Interview Archive using LLMs
Joonatan Laato, Veera Schroderus, Jenna Kanerva, Jenni Kauppi, Virpi Lummaa, Filip Ginter

TL;DR
This study develops a categorization framework and uses large language models to analyze historical interview data, enabling large-scale, structured insights into social participation among Finnish WWII evacuees.
Contribution
It introduces a novel schema for classifying social activities and organizations, and demonstrates LLMs can reliably apply this schema at scale to historical texts.
Findings
LLMs closely match expert judgments in categorization accuracy
The method scales to 350K entities in historical archives
Provides a structured dataset for social integration research
Abstract
Digitized historical archives make it possible to study everyday social life on a large scale, but the information extracted directly from text often does not directly allow one to answer the research questions posed by historians or sociologists in a quantitative manner. We address this problem in a large collection of Finnish World War II Karelian evacuee family interviews. Prior work extracted more than 350K mentions of leisure time activities and organizational memberships from these interviews, yielding 71K unique activity and organization names -- far too many to analyze directly. We develop a categorization framework that captures key aspects of participation (the kind of activity/organization, how social it typically is, how regularly it happens, and how physically demanding it is). We annotate a gold-standard set to allow for a reliable evaluation, and then test whether large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsComputational and Text Analysis Methods · Social and Cultural Dynamics · Data Analysis and Archiving
