Characterizing Measurement Error in the German Socio-Economic Panel Using Linked Survey and Administrative Data
Nico Thurow

TL;DR
This study links German survey and administrative data to analyze measurement error in labor earnings, revealing non-random sample selection, non-classical error patterns, and implications for bias in earnings regressions.
Contribution
It provides a detailed characterization of measurement error in German earnings data using linked survey and administrative sources, highlighting non-random selection and non-classical error.
Findings
Survey participation is non-random and based on observables.
Measurement error shows underreporting, autocorrelation, and correlation with true earnings.
Reliability ratios above 0.94 suggest small attenuation bias in simple regressions.
Abstract
This paper exploits the linkage of German administrative social security data (German: Integrierte Erwerbsbiografien) and survey data from the socio-economic panel (Sozio-\"okonomisches Panel, SOEP) for the characterization of measurement error in metrics quantifying individual-specific labor earnings in Germany. We find that survey participants' decision whether to consent to linkage is non-random based on observables. In that sense, the studied sample does not constitute a random sample of SOEP. Further, measurement error is not classical and differential: We observe underreporting of income on average, autocorrelation, and non-zero correlation with the true signal and other observable characteristics. In levels, calculated reliability ratios above 0.94 hint at a relatively small attenuation bias in simple linear univariate regressions with earnings as the explanatory variable. For…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurvey Methodology and Nonresponse · Data Analysis and Archiving
MethodsHierarchical Information Threading
