Nonparametric Identification and Estimation of Ratios of Multi-Category Means under Preferential Sampling

Grant Hopkins; Sarah Teichman; Ellen Graham; Amy D Willis

arXiv:2510.23920·stat.ME·October 29, 2025

Nonparametric Identification and Estimation of Ratios of Multi-Category Means under Preferential Sampling

Grant Hopkins, Sarah Teichman, Ellen Graham, Amy D Willis

PDF

TL;DR

This paper develops a nonparametric framework for identifying and estimating ratios of category-specific means in multi-category data, accounting for preferential sampling and providing robust estimators with strong finite-sample performance.

Contribution

It introduces a novel nonparametric identification approach using independence or category constraints, and proposes a doubly-robust estimator applicable to high-dimensional, infrequent categories.

Findings

01

Estimator performs well in simulations.

02

Method successfully identifies differentially abundant bacteria.

03

Framework extends to compositional data without parametric assumptions.

Abstract

Multi-category data arise in diverse fields including marketing, chemistry, public policy, genomics, political science, and ecology. We consider the problem of estimating ratios of category-specific means in a fully nonparametric setting, allowing for both observational units and categories to be preferentially sampled. We consider covariate-adjusted and unadjusted estimands that are non-parametrically defined and straightforward to interpret. While identifiability for related models has been established through parametric distributions or restrictions on the conditional mean (e.g., log-linearity), we show that identifiability can be obtained through an independence assumption or a category constraint, such as a reference category or a centering function. We develop an efficient, doubly-robust targeted minimum loss based estimator with excellent finite-sample performance, including in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.