A Log-Linear Analytics Approach to Cost Model Regularization for Inpatient Stays through Diagnostic Code Merging
Chi-Ken Lu, David Alonge, Nicole Richardson, Bruno Richard

TL;DR
This paper proposes reducing ICD-10 diagnostic code granularity to improve the stability and interpretability of cost models in healthcare, demonstrating that merging codes enhances regularization and model consistency.
Contribution
It introduces a log-linear regularization approach that merges ICD-10 codes to improve OLS model stability without losing interpretability.
Findings
Reducing ICD-10 code granularity improves coefficient stability.
Merging codes increases the Hessian trace, reducing coefficient variance.
Broader diagnostic groupings like DRGs are preferred for cost models.
Abstract
Cost models in healthcare research must balance interpretability, accuracy, and parameter consistency. However, interpretable models often struggle to achieve both accuracy and consistency. Ordinary least squares (OLS) models for high-dimensional regression can be accurate but fail to produce stable regression coefficients over time when using highly granular ICD-10 diagnostic codes as predictors. This instability arises because many ICD-10 codes are infrequent in healthcare datasets. While regularization methods such as Ridge can address this issue, they risk discarding important predictors. Here, we demonstrate that reducing the granularity of ICD-10 codes is an effective regularization strategy within OLS while preserving the representation of all diagnostic code categories. By truncating ICD-10 codes from seven characters to six or fewer, we reduce the dimensionality of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
