Online Data Collection for Efficient Semiparametric Inference
Shantanu Gupta, Zachary C. Lipton, David Childers

TL;DR
This paper introduces online data collection strategies for semiparametric inference, enabling sequential, cost-effective data gathering from multiple sources to improve estimation accuracy under budget constraints.
Contribution
It formalizes the online moment selection problem and proposes two policies with proven zero regret, advancing adaptive data collection methods for semiparametric models.
Findings
Policies outperform fixed data collection methods.
Both policies achieve zero asymptotic MSE regret.
Validated on synthetic and real-world causal inference tasks.
Abstract
While many works have studied statistical data fusion, they typically assume that the various datasets are given in advance. However, in practice, estimation requires difficult data collection decisions like determining the available data sources, their costs, and how many samples to collect from each source. Moreover, this process is often sequential because the data collected at a given time can improve collection decisions in the future. In our setup, given access to multiple data sources and budget constraints, the agent must sequentially decide which data source to query to efficiently estimate a target parameter. We formalize this task using Online Moment Selection, a semiparametric framework that applies to any parameter identified by a set of moment conditions. Interestingly, the optimal budget allocation depends on the (unknown) true parameters. We present two online data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Gaussian Processes and Bayesian Inference
MethodsSparse Evolutionary Training
