Nonparametric Identification and Estimation of Ratios of Multi-Category Means under Preferential Sampling
Grant Hopkins, Sarah Teichman, Ellen Graham, Amy D Willis

TL;DR
This paper develops a nonparametric framework for identifying and estimating ratios of category-specific means in multi-category data, accounting for preferential sampling and providing robust estimators with strong finite-sample performance.
Contribution
It introduces a novel nonparametric identification approach using independence or category constraints, and proposes a doubly-robust estimator applicable to high-dimensional, infrequent categories.
Findings
Estimator performs well in simulations.
Method successfully identifies differentially abundant bacteria.
Framework extends to compositional data without parametric assumptions.
Abstract
Multi-category data arise in diverse fields including marketing, chemistry, public policy, genomics, political science, and ecology. We consider the problem of estimating ratios of category-specific means in a fully nonparametric setting, allowing for both observational units and categories to be preferentially sampled. We consider covariate-adjusted and unadjusted estimands that are non-parametrically defined and straightforward to interpret. While identifiability for related models has been established through parametric distributions or restrictions on the conditional mean (e.g., log-linearity), we show that identifiability can be obtained through an independence assumption or a category constraint, such as a reference category or a centering function. We develop an efficient, doubly-robust targeted minimum loss based estimator with excellent finite-sample performance, including in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
