# Dimension Agnostic Testing of Survey Data Credibility through the Lens of Regression

**Authors:** Debabrota Basu, Sourav Chakraborty, Debarshi Chanda, Buddha Dev Das, Arijit Ghosh, Arnab Ray

arXiv: 2508.20616 · 2025-08-29

## TL;DR

This paper introduces a dimension-agnostic, model-specific method for assessing survey data credibility through regression, achieving sample efficiency independent of data dimension and outperforming model reconstruction approaches.

## Contribution

The paper presents a novel, sample-efficient algorithm for survey credibility testing that is independent of data dimension, focusing on verification rather than model reconstruction.

## Key findings

- Algorithm's sample complexity is independent of data dimension.
- Verification approach outperforms model reconstruction in efficiency.
- Theoretical proof and numerical validation confirm effectiveness.

## Abstract

Assessing whether a sample survey credibly represents the population is a critical question for ensuring the validity of downstream research. Generally, this problem reduces to estimating the distance between two high-dimensional distributions, which typically requires a number of samples that grows exponentially with the dimension. However, depending on the model used for data analysis, the conclusions drawn from the data may remain consistent across different underlying distributions. In this context, we propose a task-based approach to assess the credibility of sampled surveys. Specifically, we introduce a model-specific distance metric to quantify this notion of credibility. We also design an algorithm to verify the credibility of survey data in the context of regression models. Notably, the sample complexity of our algorithm is independent of the data dimension. This efficiency stems from the fact that the algorithm focuses on verifying the credibility of the survey data rather than reconstructing the underlying regression model. Furthermore, we show that if one attempts to verify credibility by reconstructing the regression model, the sample complexity scales linearly with the dimensionality of the data. We prove the theoretical correctness of our algorithm and numerically demonstrate our algorithm's performance.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20616/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20616/full.md

## References

50 references — full list in the complete paper: https://tomesphere.com/paper/2508.20616/full.md

---
Source: https://tomesphere.com/paper/2508.20616