# Pseudo-$R^2$ statistics under complex sampling

**Authors:** Thomas Lumley

arXiv: 1701.07745 · 2017-01-27

## TL;DR

This paper extends pseudo-$R^2$ statistics to complex sampling designs, providing design-consistent estimators and highlighting inconsistencies in traditional measures under case-control sampling.

## Contribution

It introduces a method to define Cox--Snell and Nagelkerke pseudo-$R^2$ under arbitrary sampling, ensuring their consistency in population model summaries.

## Key findings

- Cox--Snell and Nagelkerke $R^2$ are not design-consistent under case-control sampling.
- Traditional $R^2$ measures tend to be systematically larger in case-control samples.
- The proposed estimators are consistent for the population model summaries.

## Abstract

Model summaries based on the ratio of fitted and null likelihoods have been proposed for generalised linear models, reducing to the familiar $R^2$ coefficient of determination in the Gaussian model with identity link. In this note I show how to define the Cox--Snell and Nagelkerke summaries under arbitrary probability sampling designs, giving a design-consistent estimator of the population model summary. I also show that for logistic regression models under case--control sampling the usual Cox--Snell and Nagelkerke $R^2$ are not design-consistent, but are systematically larger than would be obtained with a cross-sectional or cohort sample, even in settings where the weighted and unweighted logistic regression estimators are similar or identical.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1701.07745/full.md

## References

14 references — full list in the complete paper: https://tomesphere.com/paper/1701.07745/full.md

---
Source: https://tomesphere.com/paper/1701.07745