Inference from Small and Big Data Sets with Error Rates

Miklos Csorgo; Masoud M Nasari

arXiv:1404.5671·stat.ME·April 24, 2014

Inference from Small and Big Data Sets with Error Rates

Miklos Csorgo, Masoud M Nasari

PDF

TL;DR

This paper introduces randomized pivot-based methods for statistical inference that improve accuracy in small data sets and enable efficient analysis of large data sets by using smaller sub-samples.

Contribution

The paper develops randomized $t$-type statistics that achieve smaller error rates and facilitate inference from large data sets using sub-sampling techniques.

Findings

01

Randomized pivots have smaller error in central limit theorems.

02

They enable inference from small data with improved accuracy.

03

They allow analysis of large data sets via sub-sampling.

Abstract

In this paper we introduce randomized $t$ -type statistics that will be referred to as randomized pivots. We show that these randomized pivots yield central limit theorems with a significantly smaller magnitude of error as compared to that of their classical counterparts under the same conditions. This constitutes a desirable result when a relatively small number of data is available. When a data set is too big to be processed, we use our randomized pivots to make inference about the mean based on significantly smaller sub-samples. The approach taken is shown to relate naturally to estimating distributions of both small and big data sets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.