# A Theory of Selective Prediction

**Authors:** Mingda Qiao, Gregory Valiant

arXiv: 1902.04256 · 2019-05-30

## TL;DR

This paper develops a theory for selective prediction in online data streams, showing that many statistics can be estimated accurately without distributional assumptions, and resolves an open problem regarding the error bounds for density prediction.

## Contribution

It introduces a model of selective prediction, proves bounds on prediction error for arbitrary sequences, and resolves an open question on the accuracy of density prediction in online settings.

## Key findings

- Expected squared error bounded by O(1/log n)
- Matching lower bound established for density prediction
- Applicable to general statistics of sequences

## Abstract

We consider a model of selective prediction, where the prediction algorithm is given a data sequence in an online fashion and asked to predict a pre-specified statistic of the upcoming data points. The algorithm is allowed to choose when to make the prediction as well as the length of the prediction window, possibly depending on the observations so far. We prove that, even without any distributional assumption on the input data stream, a large family of statistics can be estimated to non-trivial accuracy. To give one concrete example, suppose that we are given access to an arbitrary binary sequence $x_1, \ldots, x_n$ of length $n$. Our goal is to accurately predict the average observation, and we are allowed to choose the window over which the prediction is made: for some $t < n$ and $m \le n - t$, after seeing $t$ observations we predict the average of $x_{t+1}, \ldots, x_{t+m}$. This particular problem was first studied in Drucker (2013) and referred to as the "density prediction game". We show that the expected squared error of our prediction can be bounded by $O(\frac{1}{\log n})$ and prove a matching lower bound, which resolves an open question raised in Drucker (2013). This result holds for any sequence (that is not adaptive to when the prediction is made, or the predicted value), and the expectation of the error is with respect to the randomness of the prediction algorithm. Our results apply to more general statistics of a sequence of observations, and we highlight several open directions for future work.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.04256/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1902.04256/full.md

## References

14 references — full list in the complete paper: https://tomesphere.com/paper/1902.04256/full.md

---
Source: https://tomesphere.com/paper/1902.04256