Prediction in logarithmic distance

Henryk Gzyl

arXiv:1703.08696·math.PR·September 21, 2018

Prediction in logarithmic distance

Henryk Gzyl

PDF

Open Access

TL;DR

This paper introduces a logarithmic distance measure on positive vectors and variables, where the geometric mean minimizes this distance, leading to new concepts of predictors, conditional expectation, and probabilistic theorems analogous to classical results.

Contribution

It defines a novel logarithmic distance on positive vectors and variables, establishing new predictors and probabilistic limits based on this metric.

Findings

01

Geometric mean minimizes the logarithmic distance among positive vectors.

02

A new class of predictors and conditional expectations is developed based on the logarithmic metric.

03

Analogues of the Law of Large Numbers and Central Limit Theorem are established for this setting.

Abstract

The metric properties of the set in which random variables take their values lead to relevant probabilistic concepts. For example, the mean of a random variable is a best predictor in that it minimizes the standard Euclidean distance or $L_{2}$ norm in an appropriate class of random variables. Similarly, the median is the same concept but when the distance is measured by the $L_{1}$ norm. These two predictors stem from the fact that the mean and the median, minimize the distance to a given set of points when distances in $R$ or in $R^{n}$ are measured in the aforementioned metrics.\\ It so happens that an interesting {\it logarithmic distance} can be defined on the cone of strictly positive vectors in $R^{n}$ in such a way that the minimizer of the distance to a collection of points is their geometric mean.\\ This distance on the base space leads to an interesting…

Equations72

d (x_{1}, x_{2})^{2} = i = 1 \sum n (ln x_{1} (i) - ln x_{2} (i))^{2} .

d (x_{1}, x_{2})^{2} = i = 1 \sum n (ln x_{1} (i) - ln x_{2} (i))^{2} .

L_{p} = {X \in F ∣ E [∣ X_{i} ∣^{p}] < \infty, i = 1, ..., n}

L_{p} = {X \in F ∣ E [∣ X_{i} ∣^{p}] < \infty, i = 1, ..., n}

L n_{p} = {X \in C ∣ ln X \in L_{p}}, LL n_{p} = L_{p} ⋂ L n_{p} .

L n_{p} = {X \in C ∣ ln X \in L_{p}}, LL n_{p} = L_{p} ⋂ L n_{p} .

d_{ℓ} (X_{1}, X_{2})^{2} \equiv E [i = 1 \sum n (ln X_{1} (i) - ln X_{2} (i))^{2}]

d_{ℓ} (X_{1}, X_{2})^{2} \equiv E [i = 1 \sum n (ln X_{1} (i) - ln X_{2} (i))^{2}]

m_{ℓ} (X) = exp (E [ln X]) .

m_{ℓ} (X) = exp (E [ln X]) .

C o v_{ℓ} (X, Y) \equiv E [(ln X - ln m_{ℓ} (X)) (ln Y - ln m_{ℓ} (Y))^{t}] = C o v (ln X, ln Y) .

C o v_{ℓ} (X, Y) \equiv E [(ln X - ln m_{ℓ} (X)) (ln Y - ln m_{ℓ} (Y))^{t}] = C o v (ln X, ln Y) .

X^{c} \equiv exp (Σ^{- 1/2} (ln X - ln m_{ℓ} (X)))

X^{c} \equiv exp (Σ^{- 1/2} (ln X - ln m_{ℓ} (X)))

m_{ℓ} (X^{c}) = 1, Σ_{ℓ} (X^{c}) = I .

m_{ℓ} (X^{c}) = 1, Σ_{ℓ} (X^{c}) = I .

E_{ℓ} [Y ∣ X] = exp (E [ln Y ∣ X]) .

E_{ℓ} [Y ∣ X] = exp (E [ln Y ∣ X]) .

\left\{\begin{array}[]{l}a=\exp\left(E[(\ln Y)]-bE[(\ln X)]\right)\\ b=\frac{1}{D}\left(E[\ln X\ln Y]-E[\ln X]E[\ln Y]\right)\\ D=E[(\ln X)^{2}]-(E[\ln X])^{2}=\sigma^{2}(\ln X).\end{array}\right.

\left\{\begin{array}[]{l}a=\exp\left(E[(\ln Y)]-bE[(\ln X)]\right)\\ b=\frac{1}{D}\left(E[\ln X\ln Y]-E[\ln X]E[\ln Y]\right)\\ D=E[(\ln X)^{2}]-(E[\ln X])^{2}=\sigma^{2}(\ln X).\end{array}\right.

m_{ℓ} (Y^{#}) = E_{ℓ} [Y^{#}] = e^{E [l nY]}, σ_{ℓ} (Y^{#}) = b^{2} σ^{2} (ln X) .

m_{ℓ} (Y^{#}) = E_{ℓ} [Y^{#}] = e^{E [l nY]}, σ_{ℓ} (Y^{#}) = b^{2} σ^{2} (ln X) .

d (X_{1}, X_{2})_{ℓ}^{2} + 4 d (Z, Y)_{ℓ}^{2} \leq 2 d (Y, X_{1})_{ℓ}^{2} + 2 d (Y, X_{2})_{ℓ}^{2} .

d (X_{1}, X_{2})_{ℓ}^{2} + 4 d (Z, Y)_{ℓ}^{2} \leq 2 d (Y, X_{1})_{ℓ}^{2} + 2 d (Y, X_{2})_{ℓ}^{2} .

d (X^{*}, X)_{ℓ}^{2} + 4 d (Y, Z)_{ℓ}^{2} \leq 2 d (Y, X^{*})_{ℓ}^{2} + 2 d (Y, X)_{ℓ}^{2} .

d (X^{*}, X)_{ℓ}^{2} + 4 d (Y, Z)_{ℓ}^{2} \leq 2 d (Y, X^{*})_{ℓ}^{2} + 2 d (Y, X)_{ℓ}^{2} .

E_{\ell}[\prod_{i=1}^{k}{\boldsymbol{Y}}_{i}^{w_{i}}\,|\,\mathcal{G}]=\prod_{i=1}^{k}\Big{(}E_{\ell}[{\boldsymbol{Y}}|\mathcal{G}]\Big{)}^{w_{i}}

E_{\ell}[\prod_{i=1}^{k}{\boldsymbol{Y}}_{i}^{w_{i}}\,|\,\mathcal{G}]=\prod_{i=1}^{k}\Big{(}E_{\ell}[{\boldsymbol{Y}}|\mathcal{G}]\Big{)}^{w_{i}}

E_{\ell}[E_{\ell}[{\boldsymbol{Y}}|\mathcal{G}]\,|\mathcal{H}]=\exp\Big{(}E\big{[}\ln\exp E[\ln{\boldsymbol{Y}}|\mathcal{G}|\mathcal{H}\big{]}\Big{)}=\exp\Big{(}E[E[\ln{\boldsymbol{Y}}|\mathcal{G}|\mathcal{H}]\Big{)},

E_{\ell}[E_{\ell}[{\boldsymbol{Y}}|\mathcal{G}]\,|\mathcal{H}]=\exp\Big{(}E\big{[}\ln\exp E[\ln{\boldsymbol{Y}}|\mathcal{G}|\mathcal{H}\big{]}\Big{)}=\exp\Big{(}E[E[\ln{\boldsymbol{Y}}|\mathcal{G}|\mathcal{H}]\Big{)},

E_{\ell}[\prod_{i=1}^{k}{\boldsymbol{Y}}_{i}^{w_{i}}\,|\,\mathcal{G}]=\exp\Big{(}E\big{[}\sum w_{i}\ln{\boldsymbol{Y}}_{i}\,|\mathcal{G}\big{]}\Big{)}=\prod_{i=1}^{k}\Big{(}E_{\ell}[{\boldsymbol{Y}}|\mathcal{G}]\Big{)}^{w_{i}}.

E_{\ell}[\prod_{i=1}^{k}{\boldsymbol{Y}}_{i}^{w_{i}}\,|\,\mathcal{G}]=\exp\Big{(}E\big{[}\sum w_{i}\ln{\boldsymbol{Y}}_{i}\,|\mathcal{G}\big{]}\Big{)}=\prod_{i=1}^{k}\Big{(}E_{\ell}[{\boldsymbol{Y}}|\mathcal{G}]\Big{)}^{w_{i}}.

\overset{m}{^}_{ℓ} (X) = (j = 1 \prod K X_{j})^{1/ K} .

\overset{m}{^}_{ℓ} (X) = (j = 1 \prod K X_{j})^{1/ K} .

\overset{m}{^}_{ℓ} (X) = (j = 1 \prod K X_{j})^{1/ K} \to m_{ℓ}

\overset{m}{^}_{ℓ} (X) = (j = 1 \prod K X_{j})^{1/ K} \to m_{ℓ}

\overset{m}{^}_{ℓ} (X) = exp (\frac{1}{K} j = 1 \sum K ln X_{j}),

\overset{m}{^}_{ℓ} (X) = exp (\frac{1}{K} j = 1 \sum K ln X_{j}),

\overset{σ}{^}_{ℓ}^{2} (X) = \frac{1}{K - 1} j = 1 \sum K (ln X_{j} - ln \overset{m}{^}_{ℓ} (X))^{2} .

\overset{σ}{^}_{ℓ}^{2} (X) = \frac{1}{K - 1} j = 1 \sum K (ln X_{j} - ln \overset{m}{^}_{ℓ} (X))^{2} .

\overset{σ}{^}_{ℓ}^{2} (X) \to σ_{ℓ}^{2} (X)

\overset{σ}{^}_{ℓ}^{2} (X) \to σ_{ℓ}^{2} (X)

\Big{(}\prod_{j=1}^{K}\frac{X_{j}}{m_{\ell}}\Big{)}^{1/\sqrt{K}}\rightarrow e^{X}

\Big{(}\prod_{j=1}^{K}\frac{X_{j}}{m_{\ell}}\Big{)}^{1/\sqrt{K}}\rightarrow e^{X}

\Big{(}\prod_{j=1}^{K}\frac{X_{j}}{m_{\ell}}\Big{)}^{1/\sqrt{K}}=\exp\Big{(}\frac{1}{\sqrt{K}}\sum_{j=1}^{K}(\ln X_{j}-\ln m_{\ell})\Big{)}.

\Big{(}\prod_{j=1}^{K}\frac{X_{j}}{m_{\ell}}\Big{)}^{1/\sqrt{K}}=\exp\Big{(}\frac{1}{\sqrt{K}}\sum_{j=1}^{K}(\ln X_{j}-\ln m_{\ell})\Big{)}.

E_{ℓ} [X_{n + k} ∣ F_{n}] = e^{E [ξ_{n + k} ∣ F_{n}]}

E_{ℓ} [X_{n + k} ∣ F_{n}] = e^{E [ξ_{n + k} ∣ F_{n}]}

E_{ℓ} [X_{n + k} ∣ F_{n}] = X_{n} = e^{ξ_{n}} = e^{E [ξ_{n + k} ∣ F_{n}]} \leq E [e^{ξ_{n + k}} ∣ F_{n}] = E [X_{n + k} ∣ F_{n}]

E_{ℓ} [X_{n + k} ∣ F_{n}] = X_{n} = e^{ξ_{n}} = e^{E [ξ_{n + k} ∣ F_{n}]} \leq E [e^{ξ_{n + k}} ∣ F_{n}] = E [X_{n + k} ∣ F_{n}]

ln m_{l} = i = 1 \sum K w_{i} E [ln R_{i}]

ln m_{l} = i = 1 \sum K w_{i} E [ln R_{i}]

d (i = 1 \prod K R_{i}^{w_{i}}, m_{ℓ})^{2} = V a r (i = 1 \sum K w_{i} ln R_{i}) = (w, Σ w) .

d (i = 1 \prod K R_{i}^{w_{i}}, m_{ℓ})^{2} = V a r (i = 1 \sum K w_{i} ln R_{i}) = (w, Σ w) .

(ξ, η)_{x} \equiv (x^{- 1} ξ, x^{- 1} η) = (x^{- 2} ξ, η) .

(ξ, η)_{x} \equiv (x^{- 1} ξ, x^{- 1} η) = (x^{- 2} ξ, η) .

\int_{0}^{1} (\dot{x}, \dot{x})_{x} d t .

\int_{0}^{1} (\dot{x}, \dot{x})_{x} d t .

d (x_{1}, x_{2}) = in f {\int_{0}^{1} (\dot{x}, \dot{x})_{x}) d t ∣ x (t) \mbox d i f f er e n t iab l es u c h t ha t x_{1} = x (0) x_{2} = x (1)}

d (x_{1}, x_{2}) = in f {\int_{0}^{1} (\dot{x}, \dot{x})_{x}) d t ∣ x (t) \mbox d i f f er e n t iab l es u c h t ha t x_{1} = x (0) x_{2} = x (1)}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematical Inequalities and Applications · Advanced Statistical Methods and Models · Functional Equations Stability Results

Full text

Best predictors in logarithmic distance

between positive random variables

Henryk Gzyl

Centro de Finanzas IESA, Caracas, (Venezuela)

[email protected]

Abstract

The metric properties of the set in which random variables take their values lead to relevant probabilistic concepts. For example, the mean of a random variable is a best predictor in that it minimizes the standard Euclidean distance or $L_{2}$ norm in an appropriate class of random variables. Similarly, the median is the same concept but when the distance is measured by the $L_{1}$ norm.

It so happens that a geodesic distance can be defined on the cone of strictly positive vectors in $\mathbb{R}^{n}$ in such a way that the minimizer of the distance to a collection of points is their geometric mean.

This distance induces a distance on the class of strictly positive random variables, which in turn leads to an interesting notions of conditional expectation (or best predictors) and their estimators. The appropriate version of the Law of Large Numbers and the Central Limit Theorem, can also be obtained. We shall see that, for example, the lognormal variables are the analogue of the Gaussian variables for the modified version of the Central Limit Theorem.

Keywords:Prediction in logarithmic distance, Law of large numbers in logarithmic distance, Central Limit Theorem in logarithmic distance, Logarithmic geometry for positive random variables.

MSC 2010: 60B99, 60B12, 60A99.

1 Introduction and Preliminaries

The study of random variables and processes taking values in spaces with geometries other than Euclidean in not new. Consider the textbooks by Kunita and Watanabe [5] or by Hsu [3] to mention just two. Along this line of work, the notion of Euclidean distance between points of the base manifold is replaced by a distance related to a Riemannian metric placed upon the tangent manifold. Such metrics lead to a notion of geodesic distance between points of the manifold, and such distance is inherited by random variables taking values in the manifold.

It should not then be surprising that the notion of best predictor of a random variable by variables of a given class, should depend on the metric of the manifold. In this note we shall consider the manifold to be $M=(0,\infty)^{N},$ which is an open set in $\mathbb{R}^{n},$ which is also a commutative group with respect to component wise multiplication. We postpone the study of the geometry of this group to the appendix. Here we mention that what we do is the commutative version of a more elaborate geometry in the space of symmetric matrices. The reader can check with Lang [6] in which a relation of this geometry to Bruhat-Tits spaces is explained, or in Lawson and Lim [7] or Mohaker [9] and references therein, where the geometric mean property in the class of symmetric matrices is established. More recently Resigny et al. [1] and Schwartzman [10] used the same geometric setting to study the role of such geometry in a large variety of applications. The applications of the geometric ideas in these references concern the non-commutative case, but the simplest commutative case and its potential usefulness for positive random variables seems not to have been explored.

As mentioned in the abstract, it is the purpose of this note to explore the possible usefulness of measuring distances between positive numbers, not by regarding them as real numbers and the distance between them measured by the Euclidean norm, but by a logarithmic distance resulting from an interesting group invariant metric.

The appendix is devoted to basic geometry. There we shall examine the geometry on $M$ and prove that the distance between any two points ${\boldsymbol{x}}_{i},{\boldsymbol{x}}_{2}\in M$ is given by

[TABLE]

This makes $M$ a Tits-Bruhat space in which the distance satisfies a semi-parallelogram law. This is contained in Theorem 8.1. We shall use this property to establish the uniqueness of conditional expectations. And the group structure in $M$ will be inherited in a curious way by the conditional expectations (or by the best predictors) in the logarithmic distance (8.1).

But once we have motivated the appearance of the logarithmic distance, and the semi-parallelogram law associated to it, we shall come to the main objective of the paper, which is to consider the notion of best predictor (conditional expectations) in that distance, which happens to have some curious properties. These matters will be taken up in Sections 2 and 3, where we shall introduce the notion of $\ell-$ expected value and $\ell-$ conditional expectation, which will denote the best predictors in the logarithmic distance (hence the $\ell-$ prefix) introduced in Section 2. We examine there some of the basic properties of these constructs.

In Section 4 we present the two most basic estimators, namely, that of the $\ell-$ mean and that of the $\ell-$ variance, and explain how the law of large numbers and the central limit theorem for these estimators relates to the standard law of large numbers and the central limit theorems.

In section 5 we prove that the notion of martingale related to the $\ell-$ conditional expectation relates to the standard notion of martingale. We shall do it in discrete time, but the extension to continuous time is quite direct. In Section 6 we examine Markowitz portfolio theory when the distance between (gross) returns is the logarithmic distance.

As said, we leave the study of the geometry on $M$ to the appendix. There we explain how the logarithmic distance between strictly positive vectors is actually a geodesic distance in that manifold. For that we shall present some results from Lang’s [6], but in a simpler, commutative setup. This will provide us with a way of thinking about positive numbers (or vectors) in terms of the exponential map. The aim of the section is to derive the logarithmic distance between positive vector as a geodesic distance. The basic idea behind our constructions has been very much studied in geometry. The vectors with non-zero components act transitively on the positive vectors in such a way that an invariant scalar product (a Riemannian metric) can be defined which leads to a notion of geodesic distance. Actually, the exponential function will correspond to the exponential map in Riemannian geometry, and it will allow us to relate (transport) probabilistic constructs from the real to the positive numbers (vectors)

2 Best predictors in logarithmic distance

Our set up here consists of a probability space $(\Omega,\mathcal{F},P)$ and we shall be concerned with the cone $\mathcal{C}$ of $P-$ almost everywhere (a.e. for short) finite and strictly positive ( $M$ -valued) random variables. As usual, we identify variables that are $P-$ a.e. equal. Since the operations among vectors are component wise, to reduce to the case $n=1$ only takes a simple notational change. To shorten the description of the random variables used in the statements coming up below, let us introduce the following notations. For $p>1$ (we shall be concerned with $p=1,2$ only) define:

[TABLE]

Let ${\boldsymbol{X}}_{1}$ and ${\boldsymbol{X}}_{2}$ be two strictly positive random variables in $Ln_{2}$ . The (logarithmic) distance between them is defined to be

[TABLE]

Since we are identifying variables that are a.e equal, $d_{\ell}({\boldsymbol{X}}_{1},{\boldsymbol{X}}_{2})$ is a distance on $\mathcal{C}.$ Similarly to ${\boldsymbol{m}}=E[{\boldsymbol{X}}]$ being the constant that minimizes the Euclidean (squared) distance to ${\boldsymbol{X}},$ we have

Proposition 2.1.

With the notations introduced above, let ${\boldsymbol{X}}\in Ln_{1}.$ The vector ${\boldsymbol{m}}_{\ell}$ that minimizes the logarithmic distance to ${\boldsymbol{X}}$ is given by

[TABLE]

The proof of the first assertion is computational, and the second results from an application of Jensen’s inequality. When there is no risk of confusion, we shall write ${\boldsymbol{m}}_{\ell}({\boldsymbol{X}})={\boldsymbol{m}}_{\ell}.$ Keep in mind that the operations are componentwise, and that ${\boldsymbol{m}}_{\ell}({\boldsymbol{X}})_{j}=\exp(E[\ln X_{j}])$ for $j=1,...,n.$ If ${\boldsymbol{X}}\in LLn_{1},$ we also have ${\boldsymbol{m}}_{\ell}\leq E[{\boldsymbol{X}}].$

And the analogues of the notions of covariance and centering are contained in the following definition.

Definition 2.1.

Let now ${\boldsymbol{X}},{\boldsymbol{Y}}\in Ln_{2}.$ We define the logarithmic covariance matrix of the non-negative random variables ${\boldsymbol{X}}$ and ${\boldsymbol{Y}}$ by

[TABLE]

Let ${\boldsymbol{\Sigma}}$ be the matrix with components $E[\left(\ln X_{i}-\ln m_{\ell}(X_{i})\right)\left(\ln Y_{j}-\ln m_{\ell}(Y_{j})\right)].$ If the matrix ${\boldsymbol{\Sigma}}$ is invertible, we define the “centered” (in logarithmic distance) version of ${\boldsymbol{X}}$ by

[TABLE]

The need for the exponentiation is clear: First we have to “undo” the taking of the logarithms and second, the argument of the exponential function is a vector in $\mathbb{R}^{n}$ which yields a positive vector after exponentiation. It takes a simple computation to verify that

[TABLE]

A variation on the previous theme consists of predicting a variable ${\boldsymbol{Y}}$ by a variable ${\boldsymbol{X}}$ in logarithmic distance. The extension of the previous result is contained in the following statement.

Proposition 2.2.

Let ${\boldsymbol{Y}}$ and ${\boldsymbol{X}}$ be in $Ln_{2}$ . Then the $\sigma({\boldsymbol{X}})-$ measurable random variable that minimizes the logarithmic distance (2.1) to ${\boldsymbol{Y}}$ is given by

[TABLE]

And we also have $E_{\ell}[{\boldsymbol{Y}}|{\boldsymbol{X}}]\leq E[{\boldsymbol{Y}}\,|\,X].$

The proof of Proposition 2.2 follows the same pattern as the standard proof. Just notice that $\phi({\boldsymbol{X}})=\exp\left(E[\ln{\boldsymbol{Y}}\,|\,{\boldsymbol{X}}]\right)$ is a bounded, $\sigma({\boldsymbol{X}})-$ measurable random variable, such that $\ln\phi({\boldsymbol{X}})=E[\ln{\boldsymbol{Y}}\,|\,X]$ minimizes the Euclidean square distance to $\ln{\boldsymbol{Y}}.$

Note that the last inequality mentioned in the statement does not mean that one of the estimators is better than the other in any sense. They are minimizers in different metrics. Also, since linear combinations in an exponent are transported as scaling and powers, we have the following analogue to linear prediction for positive random variables.

Proposition 2.3.

Let $Y$ and $X$ be positive real variables with square integrable logarithms. The values of $a>0$ and $b\in\mathbb{R}$ that make $Y^{\#}\equiv aX^{b}$ the best predictor of $Y$ in the logarithmic metric, are given by

[TABLE]

The proof follows the standard computation starting from the definition of $d({\boldsymbol{Y}},a{\boldsymbol{X}}^{b})_{\ell}.$ Certainly the result is natural as the linear structure of $\mathbb{R}$ is transferred multiplicatively onto $(0,\infty)$ by the exponential mapping. Also, the extension to random variables taking values in higher dimensional $M$ is direct, but notationally more cumbersome.

A simple computation leads to

[TABLE]

3 Logarithmic conditional expectation and some of its properties

Here we extend the semi-parallelogram property mentioned in Theorem (8.1) to strictly positive random variables.

Lemma 3.1.

All random variables mentioned are supposed to be in $Ln_{2}.$ Let ${\boldsymbol{X}}_{1}$ and ${\boldsymbol{X}}_{2}$ be as mentioned. Then there exits ${\boldsymbol{Z}}\in Ln_{2}$ such that for any ${\boldsymbol{Y}}$ we have

[TABLE]

To prove this, use the second comment after Theorem (8.1) at every $\omega\in\Omega$ to obtain the pointwise version of the semi-parallelogram property, and then integrate with respect to $P.$ Clearly ${\boldsymbol{Z}}=({\boldsymbol{X}}_{1}{\boldsymbol{X}}_{2})^{1/2}\in LLn_{2}.$ Below we apply this to obtain the uniqueness of the extension of the standard notion of conditional expectation.

Theorem 3.1.

Let $\mathcal{G}\subset\mathcal{F}$ be a $\sigma-$ algebra, and let ${\boldsymbol{Y}}$ be non-negative with square integrable logarithm. Then, the unique -up to a set of $P$ measure [math]-, positive ${\boldsymbol{X}}^{*}\in\mathcal{G}$ that makes $d({\boldsymbol{Y}},{\boldsymbol{X}})^{2}_{\ell}$ minimum over $\{{\boldsymbol{X}}\in\mathcal{G},\;{\boldsymbol{X}}>0,E[(\ln{\boldsymbol{X}})^{2}]<\infty\},$ is given by ${\boldsymbol{X}}^{*}=\exp\left(E[\ln{\boldsymbol{Y}}\,|\,\mathcal{G}]\right).$ To be consistent with the notations introduced above, we shall write ${\boldsymbol{X}}^{*}=E_{\ell}[{\boldsymbol{Y}}\,|\,\mathcal{G}].$

Proof.

The existence follows the same pattern of proof as the propositions in the previous section, that is $E[\ln{\boldsymbol{Y}}\,|\,\mathcal{G}]$ minimizes the ordinary square distance to $\ln{\boldsymbol{Y}},$ and it is the unique (up to sets of $P$ measure [math]). We shall use the semi-parallelogram property to verify the uniqueness. For that, let ${\boldsymbol{X}}$ some other possible minimizer of the logarithmic distance. Now set ${\boldsymbol{Z}}=\sqrt{{\boldsymbol{X}}{\boldsymbol{X}}^{*}}$ (keep in mind the second comment after Theorem (8.1)), and observe that according to the semi-parallelogram property

[TABLE]

Since by definition, $d({\boldsymbol{Y}},{\boldsymbol{Z}})^{2}_{\ell}$ is larger than any of the two distances in the right hand side of the inequality, it follows that necessarily $d({\boldsymbol{X}}^{*},{\boldsymbol{X}})^{2}_{\ell}=0.$ ∎

Let us now verify some standard and non standard properties of the notion of conditional expectation introduced above. Keep in mind that the arithmetic operations with positive vectors are componentwise.

Theorem 3.2.

*Let ${\boldsymbol{Y}}\in LLn_{2}$ and let $\mathcal{H}\subset\mathcal{G}$ be two sub- $\sigma-$ algebras of $\mathcal{F}.$ Then, up to a set of measure $0,$ the following hold:

1) $E_{\ell}[{\boldsymbol{Y}}\,|\{\emptyset,\Omega\}]=E_{\ell}[{\boldsymbol{Y}}].$

***2) $E_{\ell}[E_{\ell}[{\boldsymbol{Y}}|\mathcal{G}]\,|\mathcal{H}]=E_{\ell}[{\boldsymbol{Y}}\,|\mathcal{H}].$

3)**Let ${\boldsymbol{Y}}_{1},...,{\boldsymbol{Y}}_{k}$ be in $LLn_{2},$ and $w_{i}\in\mathbb{R}.$ The analogue of the linearity property of the standard conditional expectation is the following multiplicative property:

[TABLE]

4)* If ${\boldsymbol{Y}}$ is independent of $\mathcal{G}$ in the standard sense, then $E_{\ell}[{\boldsymbol{Y}}\,|\mathcal{G}]=E_{\ell}[{\boldsymbol{Y}}].$ *

Proof.

The first assertion is simple consequence of the definition . To verify the second we start from the definition and carry on:.

[TABLE]

and now apply the standard tower property of conditional expectations to the complete the proof of the assertion.

It is in the third property where the logarithmic distance plays a curious role. The proof of the assertion is a simple computation starting from the definition:

[TABLE]

The fourth property is also simple to establish using the definition and the standard notion of independence. ∎

4 Estimators and limit theorems

In this section we shall consider the case $n=1.$ The notation is a bit simpler in this case. That is, we shall forget about the symbols in boldface for a while.

Making use of Proposition (8.1) the following definition is clear:

Definition 4.1.

Let $X_{1},...,X_{K}$ be positive random variables. We define their empirical logarithmic mean by

[TABLE]

And a the standard law of large numbers becomes:

Theorem 4.1.

Let $X_{j},\,j\geq 1$ be a collection of i.i.d. positive random variables defined on $(\Omega,\mathcal{F},P)$ having finite logarithmic variance $\sigma_{\ell}^{2}$ and mean $m_{\ell}.$ Then $\hat{X}_{\ell}$ is an unbiased estimator of the logarithmic mean $m_{\ell}(X)$ and

[TABLE]

almost surely w.r.t. $P$ as $K\rightarrow\infty.$

The proof is clear. Since

[TABLE]

we can invoke the strong law of large numbers, see Borkhar [2] or Jacod and Protter [4] , plus the continuity of the exponential function to obtain our assertion. That $\ln\hat{m}_{\ell}(X)$ has mean $m_{\ell}(X)$ is clear.

In analogy with the standard notion of empirical variance, we can introduce

Definition 4.2.

With the notations introduced above and under the assumptions in Theorem 4.1, the empirical estimator of the logarithmic variance is defined by

[TABLE]

And as in basic statistics we have

Theorem 4.2.

With the notations introduced above, and under the assumptions of Theorem (4.1), $\hat{\sigma}^{2}_{\ell}(X)$ is an unbiased estimator of the logarithmic variance and

[TABLE]

almost surely w.r.t. $P$ as $K\rightarrow\infty.$

But perhaps more interesting is the following version of the central limit theorem. It brings to the fore the role of lognormal variables as the analogue to the Gaussian random variables in the class of positive variables.

Theorem 4.3.

Suppose that $X_{j},j\geq 1$ are a collection of i.i.d. random variables defined on a probability space $(\Omega,\mathcal{F},P)$ with logarithmic mean $m_{\ell}=E[\ln X_{j}]$ and $E[(\ln X_{i})^{2}]<\infty.$ Then

[TABLE]

in probability as $K\rightarrow\infty,$ where $X\sim N(0,\sigma^{2}_{\ell}).$

Proof.

Observe that

[TABLE]

From the standard proof of the central limit theorem we know that $\frac{1}{\sqrt{K}}\sum_{j=1}^{K}(\ln X_{j}-\ln m_{\ell})$ converges in probability to an $N(0,\sigma^{2}_{\ell})$ random variable and therefore, since the exponential function is continuous, the same convergence holds for $\left(\prod_{j=1}^{K}\frac{X_{j}}{m_{\ell}}\right)^{1/\sqrt{K}}.$ Thus concludes the proof of our assertion. ∎

5 $\ell-$ martingales in discrete time

As there is a notion of $\ell-$ conditional expectation, there must be a corresponding notion of $\ell-$ martingale. In this section we examine some very simple of its properties. As usual, the basic setup consists of the probability space $(\Omega,\mathcal{F},P)$ and a filtration $\{\mathcal{F}_{n},\,n\geq 0\}.$

Theorem 5.1.

The $M-$ valued process $\{{\boldsymbol{X}}_{n};n\geq 0\}$ such that ${\boldsymbol{X}}_{n}\in\mathcal{F}_{n}$ and ${\boldsymbol{\xi}}_{n}=\ln{\boldsymbol{X}}_{n}$ are square integrable, is an $\ell$ -martingale (resp. sub-martingale, super-martingale) if and only if $\{{\boldsymbol{\xi}}_{n}\}$ is an ordinary martingale.

Also, if ${\boldsymbol{X}}_{n}$ is an $\ell-$ martingale, it is an ordinary sub-martingale.

Proof.

For $n\geq 0$ and $k\geq 1$

[TABLE]

from which the assertion of the theorem drops out. For the second assertion note that

[TABLE]

The middle step drops out from Jensen’s inequality. ∎

The corresponding version of the Doob decomposition theorem, say for sub-martingales, goes as follows.

Theorem 5.2.

With the notations introduced above, let $\{{\boldsymbol{X}}_{n}\}$ be an $M-$ valued $\ell-$ sub-martingale. Then there exist an $M-$ valued $\ell-$ martingale $\{{\boldsymbol{Y}}_{n}\}$ and an increasing $M-$ valued process ${\boldsymbol{A}}_{n},$ such that ${\boldsymbol{X}}_{n}={\boldsymbol{Y}}_{n}{\boldsymbol{A}}_{n}.$

Proof.

Just apply the Doob decomposition theorem to ${\boldsymbol{\xi}}_{n}=\ln{\boldsymbol{X}}_{n}$ and use ${\boldsymbol{X}}_{n}=e^{{\boldsymbol{\xi}}_{n}}.$ ∎

6 Logarithmic geometry and portfolio theory

Let us introduce a slight change of notation to conform with the notation is standard financial modeling. By the generic $R$ we shall denote the (gross) return of any asset of portfolio, which means the quotient of its current value divided by its initial value.

To begin with, recall from (8.3) that the curve $R_{1}^{w}R_{2}^{1-w}$ is a geodesic in the logarithmic distance between the points $R_{1}$ and $R_{2}.$ That curve can be thought of as a weighted geometric mean of $R_{1}$ and $R_{2}.$ This remark leads to variation on the theme of “return” of a portfolio. In our setup, a generic portfolio, characterized by the weights $w_{1},...,w_{K}$ of assets with gross returns $R_{1},...,R_{k},$ has a weighted return given by $\prod_{i=1}^{K}R_{i}^{w_{i}}.$ To push the geodesic interpretation a bit further, that geometric mean can be thought of as a sequence of geodesic walks joining say $R_{1}$ to $R_{K}.$ Anyway, the logarithm of the $\ell-$ mean,

[TABLE]

is clearly the logarithmic rate of growth of the portfolio. Recall as well that the logarithmic distance of $m_{\ell}$ to $\prod_{i=1}^{K}R_{i}^{w_{i}}$ is given by

[TABLE]

Imitating Markowitz’s portfolio theory, we assign to any portfolio ${\boldsymbol{w}}$ its logarithmic mean ${\boldsymbol{m}}_{\ell}({\boldsymbol{w}})$ and its logarithmic variance $\sigma_{\ell}({\boldsymbol{w}}).$ According to Markowitz’s proposal a portfolio is optimal when it minimizes the variance for a given expected value of its (rate of) return.

The content of the following proposition can be read in two ways. On one hand it provides a prescription for a choice of portfolio with given average geometric rate of return and minimal logarithmic covariance. On the other hand, it establishes a relationship between that choice of portfolio and the choice according to the Markowitz’s proposal based on the logarithmic rate of return.

Proposition 6.1.

With the notations introduced above, the weights $w_{i}^{*},....,w_{K}^{*}$ that make the logarithmic variance, $\sigma_{\ell}({\boldsymbol{w}})=d(\prod_{i=1}^{K}R_{i}^{w_{i}},m_{\ell})^{2}$ minimal subject to the constraints $\sum w_{i}=1$ and $m_{\ell}({\boldsymbol{w}})=e^{\mu},$ are the same as the weights that minimize $Var\Big{(}\sum_{i=1}^{k}w_{i}\ln R_{i}\Big{)}$ subject to $E[\sum_{i=1}^{k}w_{i}\ln R_{i}]=\mu$ and $\sum w_{i}=1.$

The proof is clear from (6.2). We refer the interested reader to Luenberger ([8] or to Shiryaev [11] for more details about the classical Markowitz portfolio optimization theory.

7 Concluding comments

In this note we proposed an alternative metric in the set of positive vectors, so that when distance between random variables is measured in this new metric, the standard notions of best predictors, their estimation, some classical convergence results, acquire a different but intuitively related form.

Also, as a simple application to finance, when assets are characterized by their gross returns (which by definition are positive random variables), the concept of return of a portfolio becomes a weighted geometric average, and the standard portfolio choice methodology appears in a slightly different guise. Readers familiar with the basics of the methodology will find it clear that the analogue of the efficient frontier, market portfolio, market line and CAPM have a counterpart within the formalism developed above, but this is not the place to pursue the matters.

8 Appendix: The logarithmic distance between positive vectors

We shall think of the vectors in $\mathbb{R}^{n}$ as functions ${\boldsymbol{\xi}}:\{1,...,n\}\rightarrow\mathbb{R},$ and all standard arithmetical operations either as component wise operations among vectors or point wise operations among functions. Let us denote by $M=\{{\boldsymbol{x}}\in\mathbb{R}^{n}\,|{\boldsymbol{x}}(i)>0,i=1,...n\}$ the set of all positive vectors. $M$ is an open set in $\mathbb{R}^{n}$ which is trivially a manifold over $\mathbb{R}^{n},$ having $\mathbb{R}^{n}$ itself as tangent space at each point. We shall use the standard notation $TM_{{\boldsymbol{x}}}$ to stress this point.

Here $M$ plays the role that the positive definite matrices play in the works by Lang, Lawson and Lim and Mohaker mentioned a few lines above. The role of the group of invertible matrices in the same references is to be played here by $G=\{{\boldsymbol{g}}\in\mathbb{R}^{n}\,|\,g(i)\not=0,\,i=1,...,n\},$ which clearly is an Abelian group respect to the standard product, in which the identity, denoted by ${\boldsymbol{e}},$ is the vector with all components equal to $1.$ We shall make use the action $G:M\rightarrow M$ of $G$ on $M$ defined by $\tau_{{\boldsymbol{g}}}({\boldsymbol{x}})={\boldsymbol{g}}^{-1}{\boldsymbol{x}}{\boldsymbol{g}}^{-1}.$ This action is clearly transitive on $M,$ and can be defined in the obvious way as an action on $\mathbb{R}^{n}.$

The transitivity of the action allows us to transport the scalar product on $TM_{{\boldsymbol{e}}}$ to any $TM_{{\boldsymbol{x}}}$ as follows. The scalar product between ${\boldsymbol{\xi}}$ and ${\boldsymbol{\eta}}$ at $TM_{{\boldsymbol{e}}}$ is defined to be the standard Euclidean product $({\boldsymbol{\xi}},{\boldsymbol{\eta}})=\sum\xi_{i}\eta_{i},$ where we shall switch between $\xi(i)$ and $\xi_{i}$ as need be. Since ${\boldsymbol{x}}=\tau_{{\boldsymbol{g}}}({\boldsymbol{e}})$ with ${\boldsymbol{g}}={\boldsymbol{x}}^{-1/2}.$ We define the scalar product transported to $TM_{{\boldsymbol{x}}}$ by

[TABLE]

This scalar product allows us to define the length of a differentiable curve as follows:

Let ${\boldsymbol{x}}(t)$ be a differentiable curve in $M,$ its length is given by

[TABLE]

With this definition, the distance between ${\boldsymbol{x}}_{1},{\boldsymbol{x}}_{2}\in M$ is defined by the expected

[TABLE]

It takes an application of the Euler-Lagrange formula to see that the equation of the geodesics in this metric is

[TABLE]

the solution to which is

[TABLE]

This allows us to compute the distance between ${\boldsymbol{x}}_{1}$ and ${\boldsymbol{x}}_{2}$ as

[TABLE]

Similarly, the solution to (8.2) subject to ${\boldsymbol{x}}(0)={\boldsymbol{x}},$ and $\dot{{\boldsymbol{x}}}(0)={\boldsymbol{\xi}}$ is the (exponential) mapping ${\boldsymbol{x}}e^{t{\boldsymbol{\xi}}}.$ With this notations we recall some results (in this simpler setup) from Chapter 5 of Lang (1995) under

Theorem 8.1.

*With the notations introduced above we have:

1) The exponential mapping is metric preserving through the origin.

2) The derivative of the exponential mapping is measure preserving, that is, $\exp^{\prime}({\boldsymbol{\xi}}){\boldsymbol{\nu}}={\boldsymbol{\nu}}e^{{\boldsymbol{\xi}}}$ as a mapping $TM_{{\boldsymbol{x}}}\rightarrow TM_{\exp{{\boldsymbol{x}}}},$ satisfies*

[TABLE]

3)* With the metric given by (1.1), $M$ is a Bruhat-Tits space, that is it is a complete metric space in which the semi-parallelogram law holds. This means that, given any ${\boldsymbol{x}}_{1},\,{\boldsymbol{x}}_{2}\in M,$ there exists a unique ${\boldsymbol{z}}\in M$ such that for ant ${\boldsymbol{y}}\in M$ the following holds*

[TABLE]

Comments

1) The action $\tau_{{\boldsymbol{g}}}$ defined a few paragraphs above coincides with parallel transport along geodesics.

**2)**The proofs take some space but are systematic and computational. In our case, commutativity makes things considerably simpler. The completeness of $M$ is transferred from $\mathbb{R}^{n}$ via the exponential mapping.

3) The point ${\boldsymbol{z}}$ mentioned in item (3) is given by ${\boldsymbol{z}}=\sqrt{{\boldsymbol{x}}_{1}{\boldsymbol{x}}_{2}}.$ Actually, a simple calculation provides the proof of the following slightly more general statement.

Lemma 8.1.

Let ${\boldsymbol{x}}_{1},...,{\boldsymbol{x}}_{K}$ be $K$ points in $M.$ The point $\bar{{\boldsymbol{x}}}_{\ell}$ that minimizes the sum of logarithmic distances (1.1) to the given points is given by their geometric mean, that is

[TABLE]

Bibliography11

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Arsigny, V., Fillard, P., Pennec, X. and Ayach, N. (2007). Geometric Means in a Novel Vector Space Structure on Symmetric positive definite matrices , SIAM J. Matrix Theory, 29 , 328-347.
2[2] Borkhar, V. Probability Theory , Springer, New York, (1995).
3[3] Hsu, E.P. Stochastic Analysis on Manifolds, Amer. Math. Soc., Providence, (2002).
4[4] Jacod, J. and Protter, P. Probability Essentials , Springer, New York, (2000).
5[5] Kunita, H. and Watanabe, S. Stochastic Differential Equations and Diffusion Processes , North Holland Pub. Co, Amsterdam, (1989).
6[6] Lang, S. Math talks for undergraduates, Springer, New York, (1999).
7[7] Lawson, J.D. and Lim, Y. (2001). The Geometric mean, matrices, metrics and more , Amer. Math.,Monthly, 108 . 797-812.
8[8] Luenberger, D.G. Investment Science , Princeton Univ. Press, Princeton, (1980).

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Best predictors in logarithmic distance

Abstract

1 Introduction and Preliminaries

2 Best predictors in logarithmic distance

Proposition 2.1**.**

Definition 2.1**.**

Proposition 2.2**.**

Proposition 2.3**.**

3 Logarithmic conditional expectation and some of its properties

Lemma 3.1**.**

Theorem 3.1**.**

Proof.

Theorem 3.2**.**

Proof.

4 Estimators and limit theorems

Definition 4.1**.**

Theorem 4.1**.**

Definition 4.2**.**

Theorem 4.2**.**

Theorem 4.3**.**

Proof.

5 ℓ−\ell-ℓ−martingales in discrete time

Theorem 5.1**.**

Proof.

Theorem 5.2**.**

Proof.

6 Logarithmic geometry and portfolio theory

Proposition 6.1**.**

7 Concluding comments

8 Appendix: The logarithmic distance between positive vectors

Theorem 8.1**.**

Lemma 8.1**.**

Proposition 2.1.

Definition 2.1.

Proposition 2.2.

Proposition 2.3.

Lemma 3.1.

Theorem 3.1.

Theorem 3.2.

Definition 4.1.

Theorem 4.1.

Definition 4.2.

Theorem 4.2.

Theorem 4.3.

5 $\ell-$ martingales in discrete time

Theorem 5.1.

Theorem 5.2.

Proposition 6.1.

Theorem 8.1.

Lemma 8.1.