TWICE: Tree-based Wage Inference with Clustering and Estimation
Aslan Bakirov, Francesco Del Prato, Paolo Zacchia

TL;DR
TWICE is a new tree-based framework for wage inference that models observable factors directly, improving robustness and interpretability over traditional fixed effects methods, and reveals complex interactions driving wage inequality.
Contribution
It introduces TWICE, a novel gradient-boosted tree approach that replaces latent fixed effects with observable partitions for wage analysis, enhancing interpretability and robustness.
Findings
TWICE outperforms linear benchmarks in out-of-sample wage prediction.
Sorting and non-additive interactions significantly contribute to wage dispersion.
TWICE reveals more complex wage determinants than standard AKM estimates.
Abstract
How much do worker skills, firm pay policies, and their interaction contribute to wage inequality? Standard approaches rely on latent fixed effects identified through worker mobility, but sparse networks inflate variance estimates, additivity assumptions rule out complementarities, and the resulting decompositions lack interpretability. We propose TWICE (Tree-based Wage Inference with Clustering and Estimation), a framework that models the conditional wage function directly from observables using gradient-boosted trees, replacing latent effects with interpretable, observable-anchored partitions. This trades off the ability to capture idiosyncratic unobservables for robustness to sampling noise and out-of-sample portability. Applied to Portuguese administrative data, TWICE outperforms linear benchmarks out of sample and reveals that sorting and non-additive interactions explain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLabor market dynamics and wage inequality · Economic and Technological Innovation · Advanced Causal Inference Techniques
