A Log-Linear Analytics Approach to Cost Model Regularization for Inpatient Stays through Diagnostic Code Merging

Chi-Ken Lu; David Alonge; Nicole Richardson; Bruno Richard

arXiv:2507.03843·cs.LG·November 10, 2025

A Log-Linear Analytics Approach to Cost Model Regularization for Inpatient Stays through Diagnostic Code Merging

Chi-Ken Lu, David Alonge, Nicole Richardson, Bruno Richard

PDF

TL;DR

This paper proposes reducing ICD-10 diagnostic code granularity to improve the stability and interpretability of cost models in healthcare, demonstrating that merging codes enhances regularization and model consistency.

Contribution

It introduces a log-linear regularization approach that merges ICD-10 codes to improve OLS model stability without losing interpretability.

Findings

01

Reducing ICD-10 code granularity improves coefficient stability.

02

Merging codes increases the Hessian trace, reducing coefficient variance.

03

Broader diagnostic groupings like DRGs are preferred for cost models.

Abstract

Cost models in healthcare research must balance interpretability, accuracy, and parameter consistency. However, interpretable models often struggle to achieve both accuracy and consistency. Ordinary least squares (OLS) models for high-dimensional regression can be accurate but fail to produce stable regression coefficients over time when using highly granular ICD-10 diagnostic codes as predictors. This instability arises because many ICD-10 codes are infrequent in healthcare datasets. While regularization methods such as Ridge can address this issue, they risk discarding important predictors. Here, we demonstrate that reducing the granularity of ICD-10 codes is an effective regularization strategy within OLS while preserving the representation of all diagnostic code categories. By truncating ICD-10 codes from seven characters to six or fewer, we reduce the dimensionality of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.