Cost-optimal Sequential Testing via Doubly Robust Q-learning
Doudou Zhou, Yiran Zhang, Dian Jin, Yingye Zheng, Lu Tian, Tianxi Cai

TL;DR
This paper introduces a doubly robust Q-learning method for learning cost-effective sequential testing policies from retrospective data, accounting for informative missingness and heterogeneity in test trajectories.
Contribution
It develops a novel doubly robust framework with path-specific weights and orthogonal pseudo-outcomes for unbiased policy estimation under complex missing data mechanisms.
Findings
The method achieves improved cost-adjusted performance in simulations.
It provides theoretical guarantees including oracle inequalities and convergence rates.
Application to prostate cancer data shows reduced testing costs without losing accuracy.
Abstract
Clinical decision-making often involves selecting tests that are costly, invasive, or time-consuming, motivating individualized, sequential strategies for what to measure and when to stop ascertaining. We study the problem of learning cost-optimal sequential decision policies from retrospective data, where test availability depends on prior results, inducing informative missingness. Under a sequential missing-at-random mechanism, we develop a doubly robust Q-learning framework for estimating optimal policies. The method introduces path-specific inverse probability weights that account for heterogeneous test trajectories and satisfy a normalization property conditional on the observed history. By combining these weights with auxiliary contrast models, we construct orthogonal pseudo-outcomes that enable unbiased policy learning when either the acquisition model or the contrast model is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
