On the optimality of the Monte-Carlo estimator
Antoine Pinochet Lobos

TL;DR
This paper proves that for Monte-Carlo estimators on atomless probability spaces, choosing independent random points minimizes the worst-case mean squared error, establishing an optimality condition.
Contribution
It provides a theoretical proof that independence in sampling yields the minimal worst-case mean squared error for Monte-Carlo estimators.
Findings
Independence minimizes worst-case mean squared error.
Optimality holds on atomless probability spaces.
The result guides best practices in Monte-Carlo sampling.
Abstract
We prove that on an atomless probability space, the worst-case mean squared error of the Monte-Carlo estimator is minimal if the random points are chosen independently.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematical Approximation and Integration · Markov Chains and Monte Carlo Methods · Stochastic processes and financial applications
On the optimality of the Monte-Carlo estimator
Antoine Pinochet Lobos
Abstract.
We prove that on an atomless probability space, the worst-case mean squared error of the Monte-Carlo estimator is minimal if the random points are chosen independently.
1. Introdution and statement of the results
Let be a probability space. We are interested in the following general question: if is a measurable, real or complex-valued function on , how can we efficiently compute the integral ? The famous Monte-Carlo method is a solution to this problem: just choose an integer big enough, and draw independent -valued random variables (that is, random points) of law , and form the mean , called the Monte-Carlo estimator.
We measure the quality of this method by computing what we call the mean squared error: we have the well-known equality, valid for all and ,
[TABLE]
and we obtain the following equality, concerning the worst-case mean squared error:
[TABLE]
In this paper, we study the question of measuring the worst-case mean squared error, in the general situation where the points are not supposed independent, and we prove the following theorem and its corollary.
Theorem**.**
Let be a probability space, let , and an -tuple of random points on such that for all , the law of is . We do not assume that the ’s are independent. Furthermore, we assume that can be partitioned in measurable subsets of equal measure.
We then have
[TABLE]
Corollary**.**
Let be an atomless probability space, , and an -tuple of random points on such that for all , the law of is . We do not assume that the ’s are independent.
We then have
[TABLE]
Remark**.**
As we shall see in the paper, in the case where and is the uniform measure on , the inequality of the theorem is an equality when the law of is the uniform measure on the set of -tuples of points in such that the coordinates are pairwise different. This random -tuple is then, in the sense of the worst-case mean squared error, than an independent -tuple.
As we saw before, the inequality in the corollary is an equality if the ’s are independent. We don’t know if this condition is necessary. It is, to our opinion, worth knowing that in [LPS86], the authors build, for all prime such that , a -tuple of uniform random points on the -sphere which are not independent, and prove that its worst-case mean squared error is , which is approximately times the lower bound in the corollary. In the article [LP18], it is shown that their construction is optimal, in a broad framework.
We confess our astonishment of having found no trace of these statements, which answer a question that we find both natural and general, and in an elementary way.
Acknowledgements**.**
We woud like to thank Sébastien Darses, Thibault Espinasse, Alexandre Gaudillière, Pierre Mathieu, Clothilde Melot, Pierre Pudlo and more particularly Christophe Pittet for the conversations that we had about the questions studied in this paper, and for their encouragement.
2. Proofs
To alleviate the presentation, we use the following notation: we consider the numbers
[TABLE]
et
[TABLE]
First of all, if , we notice that . Consequently, is also the of the for of norm zero integral.
Let , of norm and zero integral. We have that
[TABLE]
and we recover the fact recalled above: if the ’s are pairwise independent, and if is of norm of zero integral, .
Let us prove the theorem.
Proof of the theorem.
Let be measurable subsets that partition , all of measure , with . Let us denote, for , . For every , we set
[TABLE]
Moreover, we will denote, for , the value that takes on - this abuse of notation is harmless because is constant on the ’s.
is visibly of zero integral, and if , its norm is .
We will prove that there are different such that .
Let . We have that
[TABLE]
from which we deduce the inequality
[TABLE]
Let us denote
[TABLE]
Let us compute:
[TABLE]
Now, since this sum of numbers is lower or equal than , then one of the terms must be lower or equal than . For a couple such that , we then have
[TABLE]
∎
Here’s an example where the inequality is an equality.
Proposition**.**
If , if is the uniform probability on , if , and if the law of is the uniform measure on the set of -tuples of points in which coordinates are pairwise different, then the inequality in the theorem is an equality, that is,
[TABLE]
Proof.
Let be the measure on defined by
[TABLE]
where is the notation for a Dirac measure. In words, is the uniform measure on the set of -tuples of points in which coordinates are pairwise different. Let be an -tuple of law (we then have, for all , that is uniform on ).
Let be of norm , and such that . Let us compute:
[TABLE]
We therefore have
[TABLE]
∎
Let us prove the corollary.
Proof of the corollary.
We will prove that for all , we have that , which is enough. According to a theorem of Sierpiński [Sie22], every atomless probability space is such that for every , there is a measurable subset of of measure . From this, it is easy, for all arbitrarily big , to partition in of measurable subsets of equal measure. If we choose such that , which is obviously possible, then according to the theorem, it is possible to find of norm , zero integral, such that . ∎
For the sake of completeness, we add a simple proof of Sierpiński’s theorem.
Complement** (Sierpiński’s theorem on atomless probability spaces).**
If is an atomless probability space, then for every measurable , there exists non-decreasing, such that .
Proof.
The hypothesis of being atomless means that for every measurable such that , there exists a measurable such that .
Let be a measurable subset of , such that (if , it is enough to define ). By applying Zorn’s lemma, we obtain a where is a subset of , is non-decreasing, such that , , and such that has no strict extension that satisfies these properties. Let us show that equals .
On the one hand, is closed. Indeed, let be a sequence of elements in that converges to some . Let us show that . We can assume, up to extracting a subsequence, that is monotonous. If , let us define that extends by defining if is non-increasing, and if non-decreasing. According to ’s continuity properties, , and according to the monotony properties of , is non-decreasing. is therefore a strict extension of that verifies the same properties. This is a contradiction. So , and therefore, is closed.
On the other hand, verifies (we say that is order-dense). Indeed, if there are such that and , then let us use the hypothesis that is atomless, which provides a measurable such that . Let us then define that extends by defining . Then . According to ’s monotony properties, est non-decreasing. is then a strict extension of that verifies the same properties. This is a contradiction. Therefore, is order-dense.
So is closed and order-dense. Therefore, . ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[LP 18] A. Pinochet Lobos and C. Pittet. The exact convergence rate in the ergodic theorem of Lubotzky-Phillips-Sarnak. ar Xiv:1805.05261 , 2018.
- 2[LPS 86] A. Lubotzky, R. Phillips, and P. Sarnak. Hecke operators and distributing points on the sphere. I. Comm. Pure Applied Math. , 39:S 149–S 186, 1986.
- 3[Sie 22] W. Sierpiński. Sur les fonctions d’ensemble additives et continues. Fundamenta Mathematicae , 3:240–246, 1922.
