Prediction in logarithmic distance
Henryk Gzyl

TL;DR
This paper introduces a logarithmic distance measure on positive vectors and variables, where the geometric mean minimizes this distance, leading to new concepts of predictors, conditional expectation, and probabilistic theorems analogous to classical results.
Contribution
It defines a novel logarithmic distance on positive vectors and variables, establishing new predictors and probabilistic limits based on this metric.
Findings
Geometric mean minimizes the logarithmic distance among positive vectors.
A new class of predictors and conditional expectations is developed based on the logarithmic metric.
Analogues of the Law of Large Numbers and Central Limit Theorem are established for this setting.
Abstract
The metric properties of the set in which random variables take their values lead to relevant probabilistic concepts. For example, the mean of a random variable is a best predictor in that it minimizes the standard Euclidean distance or norm in an appropriate class of random variables. Similarly, the median is the same concept but when the distance is measured by the norm. These two predictors stem from the fact that the mean and the median, minimize the distance to a given set of points when distances in or in are measured in the aforementioned metrics.\\ It so happens that an interesting {\it logarithmic distance} can be defined on the cone of strictly positive vectors in in such a way that the minimizer of the distance to a collection of points is their geometric mean.\\ This distance on the base space leads to an interesting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematical Inequalities and Applications · Advanced Statistical Methods and Models · Functional Equations Stability Results
Best predictors in logarithmic distance
between positive random variables
Henryk Gzyl
Centro de Finanzas IESA, Caracas, (Venezuela)
Abstract
The metric properties of the set in which random variables take their values lead to relevant probabilistic concepts. For example, the mean of a random variable is a best predictor in that it minimizes the standard Euclidean distance or norm in an appropriate class of random variables. Similarly, the median is the same concept but when the distance is measured by the norm.
It so happens that a geodesic distance can be defined on the cone of strictly positive vectors in in such a way that the minimizer of the distance to a collection of points is their geometric mean.
This distance induces a distance on the class of strictly positive random variables, which in turn leads to an interesting notions of conditional expectation (or best predictors) and their estimators. The appropriate version of the Law of Large Numbers and the Central Limit Theorem, can also be obtained. We shall see that, for example, the lognormal variables are the analogue of the Gaussian variables for the modified version of the Central Limit Theorem.
Keywords:Prediction in logarithmic distance, Law of large numbers in logarithmic distance, Central Limit Theorem in logarithmic distance, Logarithmic geometry for positive random variables.
MSC 2010: 60B99, 60B12, 60A99.
1 Introduction and Preliminaries
The study of random variables and processes taking values in spaces with geometries other than Euclidean in not new. Consider the textbooks by Kunita and Watanabe [5] or by Hsu [3] to mention just two. Along this line of work, the notion of Euclidean distance between points of the base manifold is replaced by a distance related to a Riemannian metric placed upon the tangent manifold. Such metrics lead to a notion of geodesic distance between points of the manifold, and such distance is inherited by random variables taking values in the manifold.
It should not then be surprising that the notion of best predictor of a random variable by variables of a given class, should depend on the metric of the manifold. In this note we shall consider the manifold to be which is an open set in which is also a commutative group with respect to component wise multiplication. We postpone the study of the geometry of this group to the appendix. Here we mention that what we do is the commutative version of a more elaborate geometry in the space of symmetric matrices. The reader can check with Lang [6] in which a relation of this geometry to Bruhat-Tits spaces is explained, or in Lawson and Lim [7] or Mohaker [9] and references therein, where the geometric mean property in the class of symmetric matrices is established. More recently Resigny et al. [1] and Schwartzman [10] used the same geometric setting to study the role of such geometry in a large variety of applications. The applications of the geometric ideas in these references concern the non-commutative case, but the simplest commutative case and its potential usefulness for positive random variables seems not to have been explored.
As mentioned in the abstract, it is the purpose of this note to explore the possible usefulness of measuring distances between positive numbers, not by regarding them as real numbers and the distance between them measured by the Euclidean norm, but by a logarithmic distance resulting from an interesting group invariant metric.
The appendix is devoted to basic geometry. There we shall examine the geometry on and prove that the distance between any two points is given by
[TABLE]
This makes a Tits-Bruhat space in which the distance satisfies a semi-parallelogram law. This is contained in Theorem 8.1. We shall use this property to establish the uniqueness of conditional expectations. And the group structure in will be inherited in a curious way by the conditional expectations (or by the best predictors) in the logarithmic distance (8.1).
But once we have motivated the appearance of the logarithmic distance, and the semi-parallelogram law associated to it, we shall come to the main objective of the paper, which is to consider the notion of best predictor (conditional expectations) in that distance, which happens to have some curious properties. These matters will be taken up in Sections 2 and 3, where we shall introduce the notion of expected value and conditional expectation, which will denote the best predictors in the logarithmic distance (hence the prefix) introduced in Section 2. We examine there some of the basic properties of these constructs.
In Section 4 we present the two most basic estimators, namely, that of the mean and that of the variance, and explain how the law of large numbers and the central limit theorem for these estimators relates to the standard law of large numbers and the central limit theorems.
In section 5 we prove that the notion of martingale related to the conditional expectation relates to the standard notion of martingale. We shall do it in discrete time, but the extension to continuous time is quite direct. In Section 6 we examine Markowitz portfolio theory when the distance between (gross) returns is the logarithmic distance.
As said, we leave the study of the geometry on to the appendix. There we explain how the logarithmic distance between strictly positive vectors is actually a geodesic distance in that manifold. For that we shall present some results from Lang’s [6], but in a simpler, commutative setup. This will provide us with a way of thinking about positive numbers (or vectors) in terms of the exponential map. The aim of the section is to derive the logarithmic distance between positive vector as a geodesic distance. The basic idea behind our constructions has been very much studied in geometry. The vectors with non-zero components act transitively on the positive vectors in such a way that an invariant scalar product (a Riemannian metric) can be defined which leads to a notion of geodesic distance. Actually, the exponential function will correspond to the exponential map in Riemannian geometry, and it will allow us to relate (transport) probabilistic constructs from the real to the positive numbers (vectors)
2 Best predictors in logarithmic distance
Our set up here consists of a probability space and we shall be concerned with the cone of almost everywhere (a.e. for short) finite and strictly positive (-valued) random variables. As usual, we identify variables that are a.e. equal. Since the operations among vectors are component wise, to reduce to the case only takes a simple notational change. To shorten the description of the random variables used in the statements coming up below, let us introduce the following notations. For (we shall be concerned with only) define:
[TABLE]
[TABLE]
Let and be two strictly positive random variables in . The (logarithmic) distance between them is defined to be
[TABLE]
Since we are identifying variables that are a.e equal, is a distance on Similarly to being the constant that minimizes the Euclidean (squared) distance to we have
Proposition 2.1**.**
With the notations introduced above, let The vector that minimizes the logarithmic distance to is given by
[TABLE]
The proof of the first assertion is computational, and the second results from an application of Jensen’s inequality. When there is no risk of confusion, we shall write Keep in mind that the operations are componentwise, and that for If we also have
And the analogues of the notions of covariance and centering are contained in the following definition.
Definition 2.1**.**
Let now We define the logarithmic covariance matrix of the non-negative random variables and by
[TABLE]
Let be the matrix with components If the matrix is invertible, we define the “centered” (in logarithmic distance) version of by
[TABLE]
The need for the exponentiation is clear: First we have to “undo” the taking of the logarithms and second, the argument of the exponential function is a vector in which yields a positive vector after exponentiation. It takes a simple computation to verify that
[TABLE]
A variation on the previous theme consists of predicting a variable by a variable in logarithmic distance. The extension of the previous result is contained in the following statement.
Proposition 2.2**.**
Let and be in . Then the measurable random variable that minimizes the logarithmic distance (2.1) to is given by
[TABLE]
And we also have
The proof of Proposition 2.2 follows the same pattern as the standard proof. Just notice that is a bounded, measurable random variable, such that minimizes the Euclidean square distance to
Note that the last inequality mentioned in the statement does not mean that one of the estimators is better than the other in any sense. They are minimizers in different metrics. Also, since linear combinations in an exponent are transported as scaling and powers, we have the following analogue to linear prediction for positive random variables.
Proposition 2.3**.**
Let and be positive real variables with square integrable logarithms. The values of and that make the best predictor of in the logarithmic metric, are given by
[TABLE]
The proof follows the standard computation starting from the definition of Certainly the result is natural as the linear structure of is transferred multiplicatively onto by the exponential mapping. Also, the extension to random variables taking values in higher dimensional is direct, but notationally more cumbersome.
A simple computation leads to
[TABLE]
3 Logarithmic conditional expectation and some of its properties
Here we extend the semi-parallelogram property mentioned in Theorem (8.1) to strictly positive random variables.
Lemma 3.1**.**
All random variables mentioned are supposed to be in Let and be as mentioned. Then there exits such that for any we have
[TABLE]
To prove this, use the second comment after Theorem (8.1) at every to obtain the pointwise version of the semi-parallelogram property, and then integrate with respect to Clearly Below we apply this to obtain the uniqueness of the extension of the standard notion of conditional expectation.
Theorem 3.1**.**
Let be a algebra, and let be non-negative with square integrable logarithm. Then, the unique -up to a set of measure [math]-, positive that makes minimum over is given by To be consistent with the notations introduced above, we shall write
Proof.
The existence follows the same pattern of proof as the propositions in the previous section, that is minimizes the ordinary square distance to and it is the unique (up to sets of measure [math]). We shall use the semi-parallelogram property to verify the uniqueness. For that, let some other possible minimizer of the logarithmic distance. Now set (keep in mind the second comment after Theorem (8.1)), and observe that according to the semi-parallelogram property
[TABLE]
Since by definition, is larger than any of the two distances in the right hand side of the inequality, it follows that necessarily ∎
Let us now verify some standard and non standard properties of the notion of conditional expectation introduced above. Keep in mind that the arithmetic operations with positive vectors are componentwise.
Theorem 3.2**.**
*Let and let be two sub-algebras of Then, up to a set of measure the following hold:
1)
***2)
3)**Let be in and The analogue of the linearity property of the standard conditional expectation is the following multiplicative property:
[TABLE]
4)* If is independent of in the standard sense, then *
Proof.
The first assertion is simple consequence of the definition . To verify the second we start from the definition and carry on:.
[TABLE]
and now apply the standard tower property of conditional expectations to the complete the proof of the assertion.
It is in the third property where the logarithmic distance plays a curious role. The proof of the assertion is a simple computation starting from the definition:
[TABLE]
The fourth property is also simple to establish using the definition and the standard notion of independence. ∎
4 Estimators and limit theorems
In this section we shall consider the case The notation is a bit simpler in this case. That is, we shall forget about the symbols in boldface for a while.
Making use of Proposition (8.1) the following definition is clear:
Definition 4.1**.**
Let be positive random variables. We define their empirical logarithmic mean by
[TABLE]
And a the standard law of large numbers becomes:
Theorem 4.1**.**
Let be a collection of i.i.d. positive random variables defined on having finite logarithmic variance and mean Then is an unbiased estimator of the logarithmic mean and
[TABLE]
almost surely w.r.t. as
The proof is clear. Since
[TABLE]
we can invoke the strong law of large numbers, see Borkhar [2] or Jacod and Protter [4] , plus the continuity of the exponential function to obtain our assertion. That has mean is clear.
In analogy with the standard notion of empirical variance, we can introduce
Definition 4.2**.**
With the notations introduced above and under the assumptions in Theorem 4.1, the empirical estimator of the logarithmic variance is defined by
[TABLE]
And as in basic statistics we have
Theorem 4.2**.**
With the notations introduced above, and under the assumptions of Theorem (4.1), is an unbiased estimator of the logarithmic variance and
[TABLE]
almost surely w.r.t. as
But perhaps more interesting is the following version of the central limit theorem. It brings to the fore the role of lognormal variables as the analogue to the Gaussian random variables in the class of positive variables.
Theorem 4.3**.**
Suppose that are a collection of i.i.d. random variables defined on a probability space with logarithmic mean and Then
[TABLE]
in probability as where
Proof.
Observe that
[TABLE]
From the standard proof of the central limit theorem we know that converges in probability to an random variable and therefore, since the exponential function is continuous, the same convergence holds for Thus concludes the proof of our assertion. ∎
5 martingales in discrete time
As there is a notion of conditional expectation, there must be a corresponding notion of martingale. In this section we examine some very simple of its properties. As usual, the basic setup consists of the probability space and a filtration
Theorem 5.1**.**
The valued process such that and are square integrable, is an -martingale (resp. sub-martingale, super-martingale) if and only if is an ordinary martingale.
Also, if is an martingale, it is an ordinary sub-martingale.
Proof.
For and
[TABLE]
from which the assertion of the theorem drops out. For the second assertion note that
[TABLE]
The middle step drops out from Jensen’s inequality. ∎
The corresponding version of the Doob decomposition theorem, say for sub-martingales, goes as follows.
Theorem 5.2**.**
With the notations introduced above, let be an valued sub-martingale. Then there exist an valued martingale and an increasing valued process such that
Proof.
Just apply the Doob decomposition theorem to and use ∎
6 Logarithmic geometry and portfolio theory
Let us introduce a slight change of notation to conform with the notation is standard financial modeling. By the generic we shall denote the (gross) return of any asset of portfolio, which means the quotient of its current value divided by its initial value.
To begin with, recall from (8.3) that the curve is a geodesic in the logarithmic distance between the points and That curve can be thought of as a weighted geometric mean of and This remark leads to variation on the theme of “return” of a portfolio. In our setup, a generic portfolio, characterized by the weights of assets with gross returns has a weighted return given by To push the geodesic interpretation a bit further, that geometric mean can be thought of as a sequence of geodesic walks joining say to Anyway, the logarithm of the mean,
[TABLE]
is clearly the logarithmic rate of growth of the portfolio. Recall as well that the logarithmic distance of to is given by
[TABLE]
Imitating Markowitz’s portfolio theory, we assign to any portfolio its logarithmic mean and its logarithmic variance According to Markowitz’s proposal a portfolio is optimal when it minimizes the variance for a given expected value of its (rate of) return.
The content of the following proposition can be read in two ways. On one hand it provides a prescription for a choice of portfolio with given average geometric rate of return and minimal logarithmic covariance. On the other hand, it establishes a relationship between that choice of portfolio and the choice according to the Markowitz’s proposal based on the logarithmic rate of return.
Proposition 6.1**.**
With the notations introduced above, the weights that make the logarithmic variance, minimal subject to the constraints and are the same as the weights that minimize Var\Big{(}\sum_{i=1}^{k}w_{i}\ln R_{i}\Big{)} subject to and
The proof is clear from (6.2). We refer the interested reader to Luenberger ([8] or to Shiryaev [11] for more details about the classical Markowitz portfolio optimization theory.
7 Concluding comments
In this note we proposed an alternative metric in the set of positive vectors, so that when distance between random variables is measured in this new metric, the standard notions of best predictors, their estimation, some classical convergence results, acquire a different but intuitively related form.
Also, as a simple application to finance, when assets are characterized by their gross returns (which by definition are positive random variables), the concept of return of a portfolio becomes a weighted geometric average, and the standard portfolio choice methodology appears in a slightly different guise. Readers familiar with the basics of the methodology will find it clear that the analogue of the efficient frontier, market portfolio, market line and CAPM have a counterpart within the formalism developed above, but this is not the place to pursue the matters.
8 Appendix: The logarithmic distance between positive vectors
We shall think of the vectors in as functions and all standard arithmetical operations either as component wise operations among vectors or point wise operations among functions. Let us denote by the set of all positive vectors. is an open set in which is trivially a manifold over having itself as tangent space at each point. We shall use the standard notation to stress this point.
Here plays the role that the positive definite matrices play in the works by Lang, Lawson and Lim and Mohaker mentioned a few lines above. The role of the group of invertible matrices in the same references is to be played here by which clearly is an Abelian group respect to the standard product, in which the identity, denoted by is the vector with all components equal to We shall make use the action of on defined by This action is clearly transitive on and can be defined in the obvious way as an action on
The transitivity of the action allows us to transport the scalar product on to any as follows. The scalar product between and at is defined to be the standard Euclidean product where we shall switch between and as need be. Since with We define the scalar product transported to by
[TABLE]
This scalar product allows us to define the length of a differentiable curve as follows:
Let be a differentiable curve in its length is given by
[TABLE]
With this definition, the distance between is defined by the expected
[TABLE]
It takes an application of the Euler-Lagrange formula to see that the equation of the geodesics in this metric is
[TABLE]
the solution to which is
[TABLE]
This allows us to compute the distance between and as
[TABLE]
Similarly, the solution to (8.2) subject to and is the (exponential) mapping With this notations we recall some results (in this simpler setup) from Chapter 5 of Lang (1995) under
Theorem 8.1**.**
*With the notations introduced above we have:
1) The exponential mapping is metric preserving through the origin.
2) The derivative of the exponential mapping is measure preserving, that is, as a mapping satisfies*
[TABLE]
3)* With the metric given by (1.1), is a Bruhat-Tits space, that is it is a complete metric space in which the semi-parallelogram law holds. This means that, given any there exists a unique such that for ant the following holds*
[TABLE]
Comments
1) The action defined a few paragraphs above coincides with parallel transport along geodesics.
**2)**The proofs take some space but are systematic and computational. In our case, commutativity makes things considerably simpler. The completeness of is transferred from via the exponential mapping.
3) The point mentioned in item (3) is given by Actually, a simple calculation provides the proof of the following slightly more general statement.
Lemma 8.1**.**
Let be points in The point that minimizes the sum of logarithmic distances (1.1) to the given points is given by their geometric mean, that is
[TABLE]
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Arsigny, V., Fillard, P., Pennec, X. and Ayach, N. (2007). Geometric Means in a Novel Vector Space Structure on Symmetric positive definite matrices , SIAM J. Matrix Theory, 29 , 328-347.
- 2[2] Borkhar, V. Probability Theory , Springer, New York, (1995).
- 3[3] Hsu, E.P. Stochastic Analysis on Manifolds, Amer. Math. Soc., Providence, (2002).
- 4[4] Jacod, J. and Protter, P. Probability Essentials , Springer, New York, (2000).
- 5[5] Kunita, H. and Watanabe, S. Stochastic Differential Equations and Diffusion Processes , North Holland Pub. Co, Amsterdam, (1989).
- 6[6] Lang, S. Math talks for undergraduates, Springer, New York, (1999).
- 7[7] Lawson, J.D. and Lim, Y. (2001). The Geometric mean, matrices, metrics and more , Amer. Math.,Monthly, 108 . 797-812.
- 8[8] Luenberger, D.G. Investment Science , Princeton Univ. Press, Princeton, (1980).
