Improving the Hosmer-Lemeshow Goodness-of-Fit Test in Large Models with Replicated Trials
Nikola Surjanovic, Thomas M. Loughin

TL;DR
This paper investigates how the Hosmer-Lemeshow test's effectiveness diminishes with complex models and fixed sample sizes, and shows that a generalized version can mitigate this issue, providing guidance for better model fit assessment.
Contribution
It reveals the limitations of the traditional HL test in large models and demonstrates that the generalized HL test improves power, offering practical advice for model fit evaluation.
Findings
HL test's type 1 error rate decreases with model complexity
Generalized HL test offers better protection against power loss
Guidance provided for choosing between HL and generalized HL tests
Abstract
The Hosmer-Lemeshow (HL) test is a commonly used global goodness-of-fit (GOF) test that assesses the quality of the overall fit of a logistic regression model. In this paper, we give results from simulations showing that the type 1 error rate (and hence power) of the HL test decreases as model complexity grows, provided that the sample size remains fixed and binary replicates are present in the data. We demonstrate that the generalized version of the HL test by Surjanovic et al. (2020) can offer some protection against this power loss. We conclude with a brief discussion explaining the behaviour of the HL test, along with some guidance on how to choose between the two tests.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsForecasting Techniques and Applications · Advanced Causal Inference Techniques · Economic and Environmental Valuation
