Testing Credibility of Public and Private Surveys through the Lens of Regression
Debabrota Basu, Sourav Chakraborty, Debarshi Chanda, Buddha Dev Das,, Arijit Ghosh, Arnab Ray

TL;DR
This paper develops algorithms to test the credibility of survey data for linear regression analysis, including privacy-preserving surveys using Local Differential Privacy, with theoretical guarantees and empirical validation.
Contribution
It introduces a novel algorithm to certify survey credibility for linear regression, extending to private surveys with differential privacy, and provides optimal error bounds for noisy data.
Findings
Algorithms successfully certify survey credibility for linear regression.
Extended methods effectively handle surveys with local differential privacy.
Achieves optimal estimation error bounds for linear regression.
Abstract
Testing whether a sample survey is a credible representation of the population is an important question to ensure the validity of any downstream research. While this problem, in general, does not have an efficient solution, one might take a task-based approach and aim to understand whether a certain data analysis tool, like linear regression, would yield similar answers both on the population and the sample survey. In this paper, we design an algorithm to test the credibility of a sample survey in terms of linear regression. In other words, we design an algorithm that can certify if a sample survey is good enough to guarantee the correctness of data analysis done using linear regression tools. Nowadays, one is naturally concerned about data privacy in surveys. Thus, we further test the credibility of surveys published in a differentially private manner. Specifically, we focus on Local…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Survey Methodology and Nonresponse · Advanced Causal Inference Techniques
MethodsLinear Regression · Focus
