A Proof of the Herschel-Maxwell Theorem Using the Strong Law of Large   Numbers

Somabha Mukherjee

arXiv:1701.02228·math.PR·January 10, 2017

A Proof of the Herschel-Maxwell Theorem Using the Strong Law of Large Numbers

Somabha Mukherjee

PDF

Open Access

TL;DR

This paper provides a proof of the Herschel-Maxwell theorem using the strong law of large numbers, offering shorter proofs under certain conditions and connecting to Maxwell's characterization via the central limit theorem.

Contribution

It introduces a novel proof of the Herschel-Maxwell theorem leveraging the strong law of large numbers and explores alternative proofs using the central limit theorem.

Findings

01

Normal distribution characterized by spherical symmetry and independence

02

Shorter proofs under moment assumptions

03

Connection to Maxwell's characterization via CLT

Abstract

In this article, we use the strong law of large numbers to give a proof of the Herschel-Maxwell theorem, which characterizes the normal distribution as the distribution of the components of a spherically symmetric random vector, provided they are independent. We present shorter proofs under additional moment assumptions, and include a remark, which leads to another strikingly short proof of Maxwell's characterization using the central limit theorem.

Equations8

E e^{i \sum_{j = 1}^{n + 1} t_{j} X_{j}} = (E e^{i \sum_{j = 1}^{n} t_{j} X_{j}}) (E e^{i t_{n + 1} X_{n + 1}}) = (E e^{i (\sum_{j = 1}^{n} t_{j}^{2}) X_{1}}) (E e^{i t_{n + 1} X_{n + 1}}) = E e^{i (\sum_{j = 1}^{n + 1} t_{j}^{2}) X_{1}}

E e^{i \sum_{j = 1}^{n + 1} t_{j} X_{j}} = (E e^{i \sum_{j = 1}^{n} t_{j} X_{j}}) (E e^{i t_{n + 1} X_{n + 1}}) = (E e^{i (\sum_{j = 1}^{n} t_{j}^{2}) X_{1}}) (E e^{i t_{n + 1} X_{n + 1}}) = E e^{i (\sum_{j = 1}^{n + 1} t_{j}^{2}) X_{1}}

P (X cos θ + Y sin θ = 0, (X, Y) \neq = (0, 0))

P (X cos θ + Y sin θ = 0, (X, Y) \neq = (0, 0))

\frac{X _{n}}{∣∣ X _{n} ∣∣} = d \frac{H X _{n}}{∣∣ H X _{n} ∣∣} = H \frac{X _{n}}{∣∣ X _{n} ∣∣}

\frac{X _{n}}{∣∣ X _{n} ∣∣} = d \frac{H X _{n}}{∣∣ H X _{n} ∣∣} = H \frac{X _{n}}{∣∣ X _{n} ∣∣}

ψ^{'} (s) = ψ^{'} (s^{2} + t^{2}) (\frac{s}{s ^{2} + t ^{2}}) for all (s, t) \neq = (0, 0) .

ψ^{'} (s) = ψ^{'} (s^{2} + t^{2}) (\frac{s}{s ^{2} + t ^{2}}) for all (s, t) \neq = (0, 0) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTheoretical and Computational Physics · Computational Physics and Python Applications · Stochastic processes and financial applications

Full text

A Proof of the Herschel-Maxwell Theorem Using the Strong Law of Large Numbers

Somabha Mukherjee Electronic address: [email protected], [email protected] Department of Statistics, Wharton School, University of Pennsylvania

Abstract

In this article, we use the strong law of large numbers to give a proof of the Herschel-Maxwell theorem, which characterizes the normal distribution as the distribution of the components of a spherically symmetric random vector, provided they are independent. We present shorter proofs under additional moment assumptions, and include a remark, which leads to another strikingly short proof of Maxwell’s characterization using the central limit theorem.

KEY WORDS: Spherically symmetric; Normal distribution; Characteristic function; Strong law of large numbers; Central limit theorem.

1 Introduction

The Herschel-Maxwell theorem is one of the many beautiful characterizations of the normal disribution. It states that if the distribution of a random vector with independent components is invariant under rotations, then the components must be identically distributed as a normal distribution.

As mentioned in [2], J.C. Maxwell addressed the following question: What is the distribution of velocities of the gas particles? The argument behind Maxwell’s claim that velocities are normally distributed, hinged upon two very natural assumptions about the distribution function, independence and rotation invariance. Even before Maxwell, astronomer J.F.W. Herschel addressed a similar issue while characterizing the errors in astronomical measurements. He assumed that the components of the two-dimensional errors in measurement are independent, and that the distribution of the error is independent of its direction.

In this paper, we give a proof of the Herschel-Maxwell theorem using the strong law of large numbers, and give a remark about another unbelievably short proof of the theorem using the central limit theorem. The main tools of our analysis are characteristic functions and Haar’s Theorem for rotation-invariant measures on the surface of the unit sphere in Euclidean spaces.

2 Some Basic Properties of a Spherically Symmetric Distribution

Definition 2.1.

A random vector $\mathbf{X}$ taking values in $\mathbb{R}^{n}$ is said to have a spherically symmetric distribution, if $\mathbf{X}$ and $\mathbf{H}\mathbf{X}$ have the same distribution for every $n\times n$ real, orthogonal matrix $\mathbf{H}$ .

In the following two theorems, we state some basic properties of a spherically symmetric distribution.

Theorem 2.1.

The entries of a spherically symmetric random vector have the same distribution. Moreover, if that distribution has a finite mean, then the mean must be [math], and if that distribution has finite second moment, then any two distinct entries of the random vector are uncorrelated.

Theorem 2.2.

The random vector ${\mathbf{X}}=(X_{1},...,X_{n})^{\textrm{T}}$ has a spherically symmetric distribution if and only if its characteristic function $\phi$ satisfies $\phi({\mathbf{t}})={\mathbb{E}}e^{i||{\mathbf{t}}||X_{1}}$ for all ${\mathbf{t}}\in\mathbb{R}^{n}$ .

It follows immediately from Theorem 2.2, that if a random vector $(X_{1},...,X_{n})^{\textrm{T}}$ follows a spherically symmetric distribution, then so does all its subvectors. We now state and prove some sort of a “converse” of this fact, under an additional assumption, which will play a crucial role in our main proof.

Theorem 2.3.

Let $F$ be a distribution on $\mathbb{R}$ with the property that if $X_{1}$ and $X_{2}$ are independent observations from $F$ , then $(X_{1},X_{2})^{\textrm{T}}$ has a spherically symmetric distribution. Then for every $n$ , if $X_{1},...,X_{n}$ are independent observations from $F$ , $(X_{1},...,X_{n})^{\textrm{T}}$ has a spherically symmetric distribution.

Proof.

In view of Theorem 2.2, it suffices to show that ${\mathbb{E}}e^{i\sum_{j=1}^{n}t_{j}X_{j}}={\mathbb{E}}e^{i\left(\sqrt{\sum_{j=1}^{n}t_{j}^{2}}\right)X_{1}}$ for all $n$ and for all $t_{1},...,t_{n}$ . By Theorem 2.2, this is true for $n=1$ and $2$ . Assume that the proposition holds for some $n$ . Let $X_{1},...,X_{n},X_{n+1}$ be $(n+1)$ independent observations from $F$ . Then, by our induction hypothesis, we have:

[TABLE]

for all $t_{1},...,t_{n+1}$ . We are done. ∎

Theorem 2.4.

Let $X$ and $Y$ be two independent random variables. Suppose that $(X,Y)^{\textrm{T}}$ has a spherically symmetric distribution. Then, ${\mathbb{P}}(X=0)$ is either [math] or $1$ .

Proof.

Suppose, towards a contradiction, that $0<{\mathbb{P}}(X=0)<1$ . By Theorem 2.1, $X$ and $Y$ have the same distribution. Since $X\stackrel{{\scriptstyle\textrm{d}}}{{=}}X\cos{\mathbf{\theta}}+Y\sin{\mathbf{\theta}}$ for every ${\mathbf{\theta}}\in\mathbb{R}$ , we have for all ${\mathbf{\theta}}\in\mathbb{R}$ :

[TABLE]

However, the sets $\big{\{}(x,y)\neq(0,0):x\cos{\mathbf{\theta}}+y\sin{\mathbf{\theta}}=0\big{\}}\leavevmode\nobreak\ \left(0\leq{\mathbf{\theta}}\leq\frac{\pi}{2}\right)$ are pairwise disjoint, and they form an uncountable collection. This contradicts the fact that for any set in this collection, the probability of $(X,Y)$ belonging to that set is positive. ∎

3 The Spherical Symmetry Characterization and its Proof

We will require a simple version of Haar’s Theorem for rotation-invariant measures on $\mathcal{S}^{n-1}$ , the surface of the unit sphere in $\mathbb{R}^{n}$ . It is stated below.

Theorem 3.1.

Let $\mu$ be a rotation-invariant Borel probability measure on $\mathcal{S}^{n-1}$ i.e. $\mu(\mathbf{H}B)=\mu(B)$ for every Borel set $B\subseteq\mathcal{S}^{n-1}$ and every $n\times n$ orthogonal matrix $\mathbf{H}$ . Then, $\mu$ is the uniform measure on $\mathcal{S}^{n-1}$ .

It follows from Theorem 3.1, that if ${\mathbf{X}}$ is a unit norm, spherically symmetric random vector in $\mathbb{R}^{n}$ , then ${\mathbf{X}}$ has the uniform distribution on $\mathcal{S}^{n-1}$ . We are now ready to state and prove the main result of this paper.

Theorem 3.2.

Let $X$ and $Y$ be two independent random variables. Suppose that $(X,Y)^{\textrm{T}}$ has a spherically symmetric distribution. Then, $X$ and $Y$ are identically distributed as a normal distribution with mean [math] (and possibly [math] variance).

Proof.

By Theorem 2.1, $X$ and $Y$ have the same distribution, say $F$ . By Theorem 2.4, ${\mathbb{P}}(X=0)$ is either [math] or $1$ . In the latter case, $X$ has the normal distribution with mean [math] and variance [math]. So, assume that ${\mathbb{P}}(X=0)=0$ .

Generate a sequence $\{X_{n}\}_{n=1}^{\infty}$ of independent random variables from the distribution $F$ , and a sequence $\{Z_{n}\}_{n=1}^{\infty}$ of independent $N(0,1)$ random variables. For each $n$ , call ${\mathbf{X}}_{n}=(X_{1},...,X_{n})^{\textrm{T}}$ and ${\mathbf{Z}}_{n}=(Z_{1},...,Z_{n})^{\textrm{T}}$ . It follows from Theorem 2.3 that ${\mathbf{X}}_{n}$ has a spherically symmetric distribution i.e. for every $n\times n$ orthogonal matrix $\mathbf{H}$ , ${\mathbf{X}}_{n}$ and $\mathbf{H}{\mathbf{X}}_{n}$ have the same distribution. So,

[TABLE]

for every $n$ and every $n\times n$ orthogonal matrix $\mathbf{H}$ . Thus, $\frac{{\mathbf{X}}_{n}}{||{\mathbf{X}}_{n}||}$ is a unit norm spherically symmetric random vector in $\mathbb{R}^{n}$ , and hence, follows the uniform distribution on $\mathcal{S}^{n-1}$ . By the same argument, $\frac{{\mathbf{Z}}_{n}}{||{\mathbf{Z}}_{n}||}$ also follows the uniform distribution on $\mathcal{S}^{n-1}$ . Hence, $\frac{{\mathbf{X}}_{n}}{||{\mathbf{X}}_{n}||}\stackrel{{\scriptstyle\textrm{d}}}{{=}}\frac{{\mathbf{Z}}_{n}}{||{\mathbf{Z}}_{n}||}$ for all $n$ . This, in turn, implies that $\frac{\sqrt{n}X_{1}}{||{\mathbf{X}}_{n}||}\stackrel{{\scriptstyle\textrm{d}}}{{=}}\frac{\sqrt{n}Z_{1}}{||{\mathbf{Z}}_{n}||}$ for all $n$ . By the Strong Law of Large Numbers, the right hand side converges almost surely to $Z_{1}$ . Observe that ${\mathbb{E}}X_{1}^{2}<\infty$ , since otherwise, by the Strong Law of Large Numbers for independent and identically distributed random variables with expectation $+\infty$ , it would follow that the left hand side converges almost surely to [math], a contradiction. So, by the Strong Law of Large Numbers for finite mean, the right hand side converges almost surely to $\frac{X_{1}}{\sqrt{{\mathbb{E}}X_{1}^{2}}}$ . Hence, $\frac{X_{1}}{\sqrt{{\mathbb{E}}X_{1}^{2}}}\stackrel{{\scriptstyle\textbf{d}}}{{=}}Z_{1}$ and we are done. ∎

Remark 1.

A slight modification of the proof of Theorem 3.2 yields the following result:

Theorem 3.3.

Suppose that $\{X_{n}\}_{n=1}^{\infty}$ is a sequence of random variables satisfying the following conditions:

${\mathbb{P}}(X_{1}=0)=0$ ,
${\mathbb{E}}X_{1}^{4}<\infty$ ,
$(X_{1},...,X_{n})^{\textrm{T}}$ is spherically symmetric for all $n\geq 1$ , and
$\textrm{Cov}(X_{i}^{2},X_{j}^{2})=0$ for all $1\leq i<j$ .

Then, $X_{1},X_{2},...$ are identically distributed as a normal distribution with mean [math] and positive variance.

The only observation needed before replicating the proof of Theorem 3.2 is that, under the above conditions, $\frac{||{\mathbf{X}}_{n}||}{\sqrt{n}}\xrightarrow{\textrm{P}}\sqrt{{\mathbb{E}}X_{1}^{2}}$ . Theorem 3.3 is probably interesting only from the angle that the independence of the $X_{n}$ ’s can be relaxed in lieu of some additional assumptions, in order to arrive at the same normal characterization.

4 Shorter Proofs Under Additional Moment Assumptions

Theorem 3.2 has shorter proofs under additional assumptions of finiteness of the first and second moments of $X$ . Suppose that we only have ${\mathbb{E}}|X|<\infty$ . Since $X$ has a symmetric distribution around [math], this condition is equivalent to the existence of ${\mathbb{E}}X$ . In this case, the characteristic function $\phi$ of $X$ is differentiable on $\mathbb{R}$ .

By an application of Theorem 2.2, we have $\phi(s)\phi(t)=\phi\left(\sqrt{s^{2}+t^{2}}\right)$ for all $s,t\in\mathbb{R}$ . Since the distribution of $X$ is symmetric around [math], $\phi$ is a real valued, even function. We claim that $\phi(t)>0$ for all $t\in\mathbb{R}$ . If not, then since $\phi(0)=1$ , by the intermediate value theorem, there is a $t_{0}\in\mathbb{R}$ , such that $\phi(t_{0})=0$ . Since $\phi(t)=\left[\phi\left(\frac{t}{\sqrt{2}}\right)\right]^{2}$ for all $t\in\mathbb{R}$ , an easy induction gives $\phi(t)=\left[\phi\left(\frac{t}{2^{\frac{n}{2}}}\right)\right]^{2^{n}}$ for all $t\in\mathbb{R}$ and all $n\geq 1$ . This implies that $\phi\left(\frac{t_{0}}{2^{\frac{n}{2}}}\right)=0$ for all $n\geq 1$ , which is not possible, since $\phi$ is continuous at [math] and $\phi(0)=1$ . This proves our claim.

If we denote $\log\phi$ by $\psi$ , then we have $\psi(s)+\psi(t)=\psi\left(\sqrt{s^{2}+t^{2}}\right)$ for all $s,t\in\mathbb{R}$ . Taking partial derivative with respect to $s$ on both sides of the above identity, we get:

[TABLE]

This implies that there is a constant $c$ such that $\frac{\psi^{\prime}(s)}{s}=c$ for all $s\neq 0$ . Solving this differential equation and remembering that $\psi$ is continuous at [math] with $\psi(0)=0$ , we get $\psi(s)=\frac{cs^{2}}{2}$ for all $s\in\mathbb{R}$ . Since $\psi(s)\leq 0$ for all $s$ , we must have $c\leq 0$ . Now, $\phi(s)=e^{\frac{cs^{2}}{2}}$ for all $s\in\mathbb{R}$ implies that $X\sim N\left(0,-c\right)$ and we are done.

If further, we assume that ${\mathbb{E}}X^{2}<\infty$ , the proof turns out to be surprisingly short, and is given below.

Lemma 4.1.

Let $\{X_{n}\}_{n=1}^{\infty}$ be a sequence of independent and identically distributed random variables, satisfying that $(X_{1},...,X_{n})^{\textrm{T}}$ has a spherically symmetric distribution for all $n$ . For each $n$ , denote the partial sum $\sum_{i=1}^{n}X_{i}$ by $S_{n}$ . Then, $\frac{S_{n}}{\sqrt{n}}\leavevmode\nobreak\ (n=1,2,...)$ are identically distributed as $X_{1}$ .

Proof.

For each $n$ , let $\mathbf{H}_{n}$ denote the orthogonal matrix whose first row is $\left(\frac{1}{\sqrt{n}},\frac{1}{\sqrt{n}},...,\frac{1}{\sqrt{n}}\right)$ and let ${\mathbf{X}}_{n}=(X_{1},...,X_{n})^{\textrm{T}}$ . Since ${\mathbf{X}}_{n}$ and $\mathbf{H}_{n}{\mathbf{X}}_{n}$ have the same distribution, their first entries have the same distribution. ∎

Now, consider proving Theorem 3.2 under the assumption ${\mathbb{E}}X^{2}<\infty$ . The case ${\mathbb{E}}X^{2}=0$ is trivial, so assume that ${\mathbb{E}}X^{2}>0$ . As in the proof of Theorem 3.2, generate a sequence $\{X_{n}\}_{n=1}^{\infty}$ of independent random variables from the common distribution of $X$ and $Y$ , and for each $n$ , let $S_{n}=\sum_{i=1}^{n}X_{i}$ . By Theorem 2.3 and Lemma 4.1, $X_{1}\stackrel{{\scriptstyle\textbf{d}}}{{=}}\frac{S_{n}}{\sqrt{n}}$ for all $n$ . By Theorem 2.1, ${\mathbb{E}}X=0$ . Hence, by the Central Limit Theorem, $\frac{S_{n}}{\sqrt{n}}\xrightarrow{\textbf{d}}N(0,{\mathbb{E}}X^{2})$ . So, $X_{1}\sim N(0,{\mathbb{E}}X^{2})$ .

Remark 2.

If $\{X_{n}\}_{n=1}^{\infty}$ is an i.i.d. sequence of random variables with $S_{n}\stackrel{{\scriptstyle def}}{{=}}\sum_{i=1}^{n}X_{i}$ , and if $\frac{S_{n}}{\sqrt{n}}$ converges in distribution to a limit, then ${\mathbb{E}}X_{1}^{2}<\infty$ (see exercise $3.4.3$ of [3]). The finiteness of ${\mathbb{E}}X^{2}$ is now an immediate consequence of this fact and Lemma 4.1, which in turn gives a second proof of Theorem 3.2.

5 Conclusion

Theorem 3.2 appears in [2] (Theorem $0.0.1$ ) and an early proof of it appears in [1]. The treatment in [1] is however not very rigorous on probabilistic grounds. Corollary 10 of [4] is a consequence of Theorem 3.2.

Our first proof of Theorem 3.2 can be divided into two broad ideas. The first idea is to derive the spherical symmetry property of any number of independent observations from a distribution based on the knowledge of the spherical symmetry of two independent observations from that distribution. The second idea is to use the outcome of the first idea along with the Strong Law of Large Numbers, to conclude the result. In the process, the unit norm spherical symmetry characterization of the uniform distribution on the surface of an $n$ dimensional sphere was crucially used. The main advantage of this proof is that it is free of any calculation trickery, and is purely conceptual.

It is possible to give a more “direct” proof of Theorem 3.2 by solving the functional equation $\phi(s)\phi(t)=\phi\left(\sqrt{s^{2}+t^{2}}\right)$ for all $s,t\in\mathbb{R}$ , for a general characteristic function $\phi$ . However, this approach relies strongly on the independence assumption of the random variables, and cannot, for example, be used to prove Theorem 3.3.

Acknowledgement

The author thanks Professor J. Michael Steele for one of his assignment problems in the STAT 930 course offered by the University of Pennsylvania, which was the source of the idea behind the proof of Theorem 3.2.

Bibliography4

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Bartlett, M.S. (1934), The vector representation of a sample, Math. Proc. Cambr. Phil. Soc. 30, pp. 327-340.
2[2] Bryc, W. (2005), Normal Distribution characterizations with applications, Lecture Notes in Statistics 1995, Vol 100.
3[3] Durrett, R. (2013), Probability: Theory and Examples, Cambridge University Press .
4[4] Meckes, E.S. & Meckes, M.W. (2007), The Central Limit Problem For Random Vectors With Symmetries, J Theoret. Probab. 20, pp. 697-720.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

A Proof of the Herschel-Maxwell Theorem Using the Strong Law of Large Numbers

Abstract

Contents

1 Introduction

2 Some Basic Properties of a Spherically Symmetric Distribution

Definition 2.1**.**

Theorem 2.1**.**

Theorem 2.2**.**

Theorem 2.3**.**

Proof.

Theorem 2.4**.**

Proof.

3 The Spherical Symmetry Characterization and its Proof

Theorem 3.1**.**

Theorem 3.2**.**

Proof.

Remark 1**.**

Theorem 3.3**.**

4 Shorter Proofs Under Additional Moment Assumptions

Lemma 4.1**.**

Proof.

Remark 2**.**

5 Conclusion

Acknowledgement

Definition 2.1.

Theorem 2.1.

Theorem 2.2.

Theorem 2.3.

Theorem 2.4.

Theorem 3.1.

Theorem 3.2.

Remark 1.

Theorem 3.3.

Lemma 4.1.

Remark 2.