A Plot is Worth a Thousand Tests: Assessing Residual Diagnostics with the Lineup Protocol
Weihao Li, Dianne Cook, Emi Tanaka, Susan VanderPlas

TL;DR
This paper demonstrates that visual residual diagnostics using the lineup protocol provide more reliable and comprehensive model assessment than traditional numerical tests, which can be overly sensitive or insensitive.
Contribution
It introduces the lineup protocol as an effective visual inference method for residual diagnostics, improving model assessment accuracy over conventional tests.
Findings
Lineup protocol yields more reliable residual diagnostics.
Traditional tests are often too sensitive or insensitive.
Visual inference detects multiple residual issues simultaneously.
Abstract
Regression experts consistently recommend plotting residuals for model diagnosis, despite the availability of many numerical hypothesis test procedures designed to use residuals to assess problems with a model fit. Here we provide evidence for why this is good advice using data from a visual inference experiment. We show how conventional tests are too sensitive, which means that too often the conclusion would be that the model fit is inadequate. The experiment uses the lineup protocol which puts a residual plot in the context of null plots. This helps generate reliable and consistent reading of residual plots for better model diagnosis. It can also help in an obverse situation where a conventional test would fail to detect a problem with a model due to contaminated data. The lineup protocol also detects a range of departures from good residuals simultaneously. Supplemental materials for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRisk and Safety Analysis
