A forgotten Theorem of Schoenberg on one-sided integral averages
Stefan Steinerberger

TL;DR
This paper revisits a classical theorem by Schoenberg, demonstrating that the unique weight function satisfying certain natural conditions for one-sided averaging is the exponential distribution.
Contribution
It clarifies and formalizes Schoenberg's implicit theorem, showing the exponential distribution uniquely satisfies the specified conditions for one-sided local averages.
Findings
Exponential distribution uniquely satisfies the conditions.
Constant functions are preserved under the averaging.
The number of threshold crossings is bounded by the original function.
Abstract
Let be a function for which we want to take local averages. Assuming we cannot look into the future, the 'average' at time can only use for . A natural way to do so is via a weight and We would like that (1) constant functions, , are mapped to themselves and (2) to be monotonically decreasing (the more recent past should weigh more heavily than the distant past). Moreover, we want that (3) if crosses a certain threshold times, then should not cross the same threshold more than times (if is the outside wind speed and crosses the Tornado threshold at two points in time, we would like the averaged wind speed to cross the Tornado threshold at most twice). A Theorem implicit in the work of Schonberg is that these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIterative Methods for Nonlinear Equations · Mathematical functions and polynomials · Numerical Methods and Algorithms
A forgotten Theorem of Schönberg
on one-sided integral averages
Stefan Steinerberger
Department of Mathematics, Yale University, New Haven, CT 06511, USA
Abstract.
Let be a function for which we want to take local averages. Assuming we cannot look into the future, the ’average’ at time can only use for . A natural way to do so is via a weight and
[TABLE]
We would like that (1) constant functions, , are mapped to themselves and (2) to be monotonically decreasing (the more recent past should weigh more heavily than the distant past). Moreover, we want that (3) if crosses a certain threshold times, then should not cross the same threshold more than times (if is the outside wind speed and crosses the Tornado threshold at two points in time, we would like the averaged wind speed to cross the Tornado threshold at most twice). A Theorem implicit in the work of Schönberg is that these three conditions characterize a unique weight that is given by the exponential distribution
[TABLE]
Key words and phrases:
Integral averages, aggregate function, exponential distribution.
2010 Mathematics Subject Classification:
44A35, 62M10
S.S. is supported by the NSF (DMS-1763179) and the Alfred P. Sloan Foundation.
1. Introduction and Result
The purpose of this paper is to discuss how one would go about averaging continuous functions. Let be a continuous function and suppose that we are interested in, at a given time , finding a local average of using only function values for . This is the canonical setting for many applications where we cannot look into the future (one only needs to think of sports or finance where this is a constant problem). A natural way of constructing an average is via
[TABLE]
where is a (not necessarily continuous) weighting function. Many different weighting functions are conceivable, the one that is presumably used most often in practice is
[TABLE]
the average taken over the last units of time. A natural question is whether there is a ’best’ weight and, as usual, this depends on how one defines things. We will proceed in an axiomatic fashion and state a list of desirable properties.
Property 1. Invariance of Constants. Averaging should leave constant functions, , invariant.
Property 2. Monotonicity. is (not necessarily strictly) monotonically decreasing.
Property 3. Variation-diminishing property. For any , if the set is a union of (not necessarily bounded) intervals, then the set is the union of at most intervals. If is the union of at most (not necessarily bounded) intervals, then so is .
The first condition is completely unambiguous: an average of constant values needs to return the same value in order to be meaningful. This translates easily into
[TABLE]
We observe that this condition also implies that any function satisfying Properties (1) and (2) is nonnegative: if it assumes negative values anywhere, then monotonicity would imply that it is not integrable which violates Property (1). In particular, is a probability distribution. As a consequence of that, we have that
[TABLE]
which is also exceedingly natural: the average value at any point cannot exceed the previously attained maximal value or be smaller than all previous values. The second condition, monotonicity, is natural insofar as we would like the recent past to be more representative than the distant past. Property (3) is a smoothing property: the averaged function should not venture into ’extreme’ territory more often than the function does itself . Requiring property (3) to be satisfied for all therefore corresponding to a uniform smoothing at all scales – extreme events can be represented in the average but they should not be over-represented. A simple example is as follows: suppose is the ELO strength of a chess player measured at time . This indicator is discontinuous and changes after each game – however, if a chess players has their ELO exceed 2800 for the entirety of the year 2016 and then once more, briefly, in 2018, then it would be desirable for the averaged function to exceed the value 2800 at most two times and not, say, three times. It would be perfectly reasonable, however, if the averaged function exceeds the value 2800 only once or never at all (for example if the value in 2016 hovers very close to 2800 all the time and was much lower before or, conversely, if in the month of 2018 the value is only exceeded for a brief period of time).
These three conditions uniquely characterize a weight (up to dilation symmetries).
Theorem** (Schönberg).**
If a function satisfies properties (1), (2) and (3), then
[TABLE]
To the best of our knowledge, this Theorem has never been stated or proved. I. J. Schönberg mentions in passing in his 1948 paper [9] that, as a consequence of his classification theorem, ’All Polya frequency functions turn out to be continuous everywhere with the single exception of the truncated exponential’ and this is exactly what is needed to prove the Theorem which should be attributed to Schönberg. The use of exponential distributions to compute one-sided averages is completely classical in time series analysis (’exponential smoothing’) and usually ascribed to work of Brown [1] or Holt [4] in the 1950s but the fact that properties (1) – (3) uniquely characterize exponential smoothing does not seem to be known.
We emphasize that, as one often encounters in axiomatic approaches, the result is only as good as one’s faith in the axioms. This is the second purpose of this paper: to perhaps motivate a study of axiomatic approaches towards integral averages. What properties should an integral averaging operator have and which types of averages possess these properties? We believe all three properties to fairly natural (with (3) being a particularly subtle way of defining smoothing). As is customary in axiomatic approaches, there are presumably other axioms that might also be of interest and will generally lead to different results.
2. The Proof
Proof.
As discussed above, properties (1) and (2) imply that is a probability density function. This implies, by linearity, that the function is invariant under adding constants. This allows us to replace the study of
[TABLE]
with the study of when and become positive (by replacing with which leads to being replaced by ). Property (3) is then equivalent to asking that the number of sign changes of is at most that of the number of sign changes of or equivalently, it asks that convolution with has the variation-diminishing property in the sense of Schönberg. Phrased differently, we learn that is a Polya frequency function [2, 3, 5, 6, 7]. Schönberg’s theory [8, 9, 10, 11] implies
[TABLE]
for some where the function is an entire function of the form
[TABLE]
where , , and are real numbers and . It remains to find all functions of that type satisfying all our properties. We analyze the behavior for purely imaginary where . Since and are real,
[TABLE]
The product is well defined since, using for ,
[TABLE]
Moreover, we have for all . We distinguish two cases: either all but one are zero or at least two are nonzero. In the second case, we observe
[TABLE]
for some fixed . Applying the inverse Fourier transform shows that
[TABLE]
We note that by assumption (2) as well as for all by assumption. Let be so large that
[TABLE]
We then observe that
[TABLE]
The first term can be made sufficiently small by further increasing . This contradiction shows that all but exactly one to be 0. The same argument can be run of . Thus
[TABLE]
This shows that
[TABLE]
We define a function
[TABLE]
We note that
[TABLE]
An application of the inverse Fourier transform shows that
[TABLE]
The normalization
[TABLE]
then implies
[TABLE]
∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. Brown, Exponential Smoothing for Predicting Demand. Cambridge, Massachusetts: Arthur D. Little Inc. (1956)
- 2[2] C. de Boor, A Practical Guide to Splines, Springer, 1978.
- 3[3] J. M. Lane and R.F. Riesenfeld, A geometric proof for the variation diminishing property of B-spline approximation, Journal of Approximation Theory 37, p. 1-4 (1983).
- 4[4] C. Holt, Forecasting Trends and Seasonal by Exponentially Weighted Averages. Office of Naval Research Memorandum. 52 (1957) and reprinted in C. Holt, Forecasting Trends and Seasonal by Exponentially Weighted Averages. International Journal of Forecasting. 20 (1): 5–10 (2004).
- 5[5] M. Marsden and I. Schönberg, On Variation Diminishing Spline Approximation Methods, in: I. J. Schoenberg Selected Papers, p. 247–268, Springer, 1988.
- 6[6] G. Polya, Qualitatives über Wärmeausgleich, Z. angew. Math. u. Mech. 13, 125–128 (1933);
- 7[7] G. Polya, G., Sur un theoreme de Laguerre, Compt. Rend. 156, 996–999 (1913).
- 8[8] I. Schönberg, On Totally Positive Functions, La Place Integrals and Entire Functions of the La Guerre-Polya-Schur Type, Proc. Natl. Acad. Sci. U.S.A 33, p. 11-17, 1947.
