Statistical Challenges in Analyzing Migrant Backgrounds Among University Students: a Case Study from Italy
Lorenzo Giammei, Laura Terzera, Fulvia Mecatti

TL;DR
This study addresses statistical challenges in analyzing university students with migrant backgrounds in Italy, proposing a methodology to accurately identify and analyze this population using administrative data and surveys.
Contribution
It introduces an expanded administrative dataset with migrant indicators and compares predictive models, advancing methods for studying migrant student populations.
Findings
Created an enriched dataset with migrant background indicators
Identified selection bias in survey data
Compared logistic regression and random forest models
Abstract
The methodological issues and statistical complexities of analyzing university students with migrant backgrounds is explored, focusing on Italian data from the University of Milano-Bicocca. With the increasing size of migrant populations and the growth of the second and middle generations, the need has risen for deeper knowledge of the various strata of this population, including university students with migrant backgrounds. This presents challenges due to inconsistent recording in university datasets. By leveraging both administrative records and an original targeted survey we propose a methodology to fully identify the study population of students with migrant histories, and to distinguish relevant subpopulations within it such as second-generation born in Italy. Traditional logistic regression and machine learning random forest models are used and compared to predict migrant status.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsUrban, Neighborhood, and Segregation Studies
