Optimal integrating learning for split questionnaire design type data
Cunjie Lin, Jingfu Peng, Yichen Qin, Yang Li, Yuhong Yang

TL;DR
This paper introduces an optimal integrating learning method for split questionnaire design data, effectively combining partial information to estimate regression functions with improved efficiency and reduced bias.
Contribution
The authors propose a novel averaging approach that optimally combines models from data blocks, outperforming traditional missing data methods in accuracy and computational efficiency.
Findings
The method achieves asymptotic optimality in squared loss and risk.
Simulation studies demonstrate superior performance over existing methods.
Application to European Social Survey data confirms practical effectiveness.
Abstract
In the era of data science, it is common to encounter data with different subsets of variables obtained for different cases. An example is the split questionnaire design (SQD), which is adopted to reduce respondent fatigue and improve response rates by assigning different subsets of the questionnaire to different sampled respondents. A general question then is how to estimate the regression function based on such block-wise observed data. Currently, this is often carried out with the aid of missing data methods, which may unfortunately suffer intensive computational cost, high variability, and possible large modeling biases in real applications. In this article, we develop a novel approach for estimating the regression function for SQD-type data. We first construct a list of candidate models using available data-blocks separately, and then combine the estimates properly to make an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSurvey Sampling and Estimation Techniques · Statistical Methods and Bayesian Inference · Statistical Methods and Inference
