Covid-19 risk factors: Statistical learning from German healthcare claims data
Roland Jucknewitz, Oliver Weidinger, Anja Schramm

TL;DR
This study employs a detailed statistical learning approach on German healthcare claims data to identify risk factors for severe COVID-19 outcomes, avoiding prior variable grouping and enhancing predictive accuracy.
Contribution
It introduces a novel methodology that uses hierarchical medical classification data with over 33,000 covariates, improving risk prediction without prior subject-matter assumptions.
Findings
Better predictive ability than morbidity groups
Utilizes over 33,000 covariates from medical classifications
Provides coefficients for risk factor prioritization
Abstract
We analyse prior risk factors for severe, critical or fatal courses of Covid-19 based on a retrospective cohort using claims data of the AOK Bayern. As our main methodological contribution, we avoid prior grouping and pre-selection of candidate risk factors. Instead, fine-grained hierarchical information from medical classification systems for diagnoses, pharmaceuticals and procedures are used, resulting in more than 33,000 covariates. Our approach has better predictive ability than well-specified morbidity groups but does not need prior subject-matter knowledge. The methodology and estimated coefficients are made available to decision makers to prioritize protective measures towards vulnerable subpopulations and to researchers who like to adjust for a large set of confounders in studies of individual risk factors.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
