On linear weak predictability with single point spectrum degeneracy
Nikolai Dokuchaev

TL;DR
This paper investigates continuous time processes with spectrum degeneracy at a single point, demonstrating their weak linear predictability using universal, time-invariant predictors that are robust to noise.
Contribution
It introduces explicit universal predictors for processes with single point spectrum degeneracy, applicable without spectrum details, and analyzes their robustness.
Findings
Predictors are explicitly constructed in the frequency domain.
Predictors are universal for the entire class of processes with spectrum degeneracy.
Predictors exhibit robustness to noise contamination.
Abstract
The paper studies properties of continuous time processes with spectrum degeneracy at a single point where their Fourier transforms vanish with a certain rate. It appears that these processes are linearly predictable in some weak sense, meaning that convolution integrals over future times can be approximated by causal convolutions over past times. The corresponding predicting kernels are time invariant, and they are presented explicitly in the frequency domain via their transfer functions. These predictors are "universal" meaning that they do not require to know details of the spectrum of the underlying processes; the same predictor can be used for the entire class of processes with a single point spectrum degeneracy. The predictors feature some robustness with respect to noise contamination.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
On linear weak predictability with single point spectrum degeneracy
Nikolai Dokuchaev
(Submitted: May 8, 2017. Revised: January 9, 2020)
Abstract
The paper studies properties of continuous time processes with spectrum degeneracy at a single point where their Fourier transforms vanish with a certain rate. It appears that these processes are linearly predictable in some weak sense, meaning that convolution integrals over future times can be approximated by causal convolutions over past times. The corresponding predicting kernels are time invariant, and they are presented explicitly in the frequency domain via their transfer functions. These predictors are ”universal” meaning that they do not require to know details of the spectrum of the underlying processes; the same predictor can be used for the entire class of processes with a single point spectrum degeneracy. The predictors feature some robustness with respect to noise contamination.
Keywords: Fourier transform, spectrum degeneracy, pathwise setting, linear predictors.
I Introduction
The paper studies properties of continuous time processes with spectrum degeneracy in a pathwise deterministic setting, i.e., without probabilistic assumptions on the ensemble, where an underlying process is deemed to be unique and such that one cannot rely on statistics collected from observations of other similar paths. A decision (a prediction, an estimate, etc.) has to be based on the intrinsic properties of this single observed path.
There are some opportunities for prediction and interpolation of continuous time processes in pathwise setting with certain degeneracy of their spectrum.
- •
In the stochastic setting for continuous time stationary Gaussian processes, there exist optimal predictors represented by causal linear integral operators; see the review of these results in [10, 27]. The predictors are optimal in the sense of minimization of the mean square error; their selection is defined by the spectral density of the underlying process. By the Kolmogorov-Krein Theorem, this error can be zero if and only if
[TABLE]
see, e.g., [11], p. 57.
- •
The classical Nyquist-Shannon-Kotelnikov interpolation theorem states that a band-limited function can be uniquely recovered without error from a infinite equidistant sampling sequence. The sampling rate must be at least twice the maximum frequency present in the signal (the critical Nyquist rate).
- •
Functions are uniquely defined by the samples taken with the rate defined by the measure of the spectrum support only; see [14], p.39.
- •
Functions with certain periodicity of the location of gaps in the spectrum and with some restrictions on the measure or on accumulation at infinity of the spectrum gap are uniquely defined by the sparse samples below the Nyquist rate at sampling points deviating slightly from arithmetic progressions [12, 13, 18, 19, 23, 24].
- •
Band-limited functions are analytic and are uniquely defined by their values on an arbitrarily small time interval. In particular, band-limited functions are uniquely defined by their past values, i.e. predictable.
- •
Functions with exponential decrease of energy on higher frequencies are uniquely defined by their past values. Moreover, there exist linear predictors that do not require to know the spectrum, with the prediction horizon defined by the rate of the energy decrease Dokuchaev [4].
- •
Functions with the Fourier transform vanishing on an arbitrarily small interval for some are uniquely defined by their past values. There are linear predictors defined by only that allow to predict anticausal convolutions involving the future values D08 [3].
The present paper shows that a degeneracy of the Fourier transform for continuous processes at a single point only still ensures some linear extrapolation opportunities for continuous time processes in the pathwise deterministic setting. It shows that processes featuring this degeneracy are linearly predictable is some weak sense, meaning that anti-causal convolution integrals over future time can be approximated by causal convolution integrals over past time (Theorem 1). This result sheds some new light on the impact of spectrum degeneracy on the predictability and extrapolation.
To prove the predictability of the anti-causal convolutions, we obtained a family of new linear predictors represented by causal convolutions (Theorem 2). The predictors are given explicitly in the frequency domain.
The predictors suggested in the paper are not error free; however, the prediction error can be made arbitrarily small, and there is some robustness with respect to the noise contamination. The predictors suggested here are constructed using the approach developed in [3, 4, 5, 6] but are quite different.
We emphasize that this result is not a straightforward rewording linear of extrapolation results known for stochastic Gaussian processes with the spectral densities. One reason for this is that the properties of these stationary processes are quite special and cannot be mechanically transferred to deterministic functions and their spectrums. For example, it appears that the criterion of recoverability of a single value for a discrete time stationary Gaussian process is different than in the pathwise deterministic setting ([8], p.86). Furthermore, the optimal extrapolating operators known for Gaussian stationary processes have to be constructed for a particular shape of the spectral density (see e.g. [10, 26, 17, 25, 16, 27]). On the other hand, unlike the linear predictors known for the Gaussian processes, the predictors introduced below are ”universal” meaning that they do not require to know the shape of the spectrum (i.e. the Fourier transform) of the underlying processes; the same predictor can be used for a large class of different processes.
The paper is organized in the following manner. In Section II, we formulate the definitions and background facts related to the linear weak predictability. In Section III, we formulate the main theorems on predictability and predictors (Theorem 1 and Theorem 2). Section V contains the proofs. In Section IV, we discuss the robustness of the predictors. Finally, in Section VI, we discuss our results.
II Definitions and background
Let denote the indicator function, , , .
For complex valued functions or , we denote by the function defined on as the Fourier transform of :
[TABLE]
If , then is defined as an element of (meaning ).
For such that for , we denote by the Laplace transform
[TABLE]
Let be the Hardy space of holomorphic on functions with finite norm , ; see, e.g., [9], Chapter 11.
By the Paley-Wiener Theorem, if and only if for some such that for ; see e.g. Theorem 19.2 in [20], p.372.
The definitions below in this section are similar to the definitions introduced in [3].
Definition 1
Let be the class of functions such that for and such that, for any , there exists an integer , a set , and a polynomial such that and is represented as
[TABLE]
where .
In particular, the class includes all linear combinations of functions , where .
Definition 2
Let be the class of functions such that for and .
We will use the notation “” for the convolution in .
We are going to study linear predictors for anti-causal convolutions with . More precisely, we will study possibility of their approximation by causal convolutions with . By the choice of and , it follows that
[TABLE]
The corresponding predictors are linear; they are represented by causal time-invariant convolutions and allow frequency representations via transfer functions which is a preferable in electronic engineering, systems and control. This makes them convenient for applications. In particular, this is because the linear time-invariant systems they can be realised via fixed electronic hardware schemes.
For , we define linear normed spaces of complex valued functions such that and for .
Definition 3
Let be given. Let be a given set of functions .
- (0)
We say that the set is predictable at time if, for any , if for a.e. then for a.e. .
- (ii)
We say that the set is linearly -predictable in the weak sense if, for any , there exists a sequence such that
[TABLE]
where and .
- (ii)
Let be a set of processes which is also a linear normed space provided with a norm . We say that the set is linearly -predictable in the weak sense uniformly with respect to the norm , if, for any and , there exists such that
[TABLE]
where and .
We call functions and in Definition 3 predicting kernels.
Proposition 1
Let be such as in Definition 3(ii) with . Then is predictable in the sense of Definition 3(i).
The proof of Proposition 1 given below is based on the completeness of the system in . In fact, even a smaller set of finite linear combinations of exponents , with such that is everywhere dense in ; see e.g. Crum [2], Sedletskij [21]. In theory, this may provide an approximate linear prediction method for the entire paths of processes being predictable in the sense of Definition 3(ii). For example, assume that the path is observable. Then, for , a prediction of can be approximated as , where is an orthonormal basis in constructed from the sequence by the Gram-Schmidt orthonormalization procedure, and where the values are the predictions of the integrals that can be found under the assumptions of Definition 3(ii). This would be numerically challenging since the predictors have to be constructed for each individually. In this paper, we focus on the prediction of single anti-causal convolutions.
The following examples illustrate the difference between different types of predictability in Definition 3.
Example 1
- (i)
Any singleton set is predictable at any time in the sense of Definition 3(i). 2. (ii)
Let , and let . The singleton set is predictable at any time in the sense of Definition 3(i) but is not linearly predictable in the sense of Definition 3(ii) or Definition 3(iii). 3. (iii)
Let be given, and let be the set of all band-limited processes such that if , where . Then is linearly predictable in the sense of Definition 3(ii). 4. (iv)
Let be given, and let be the set of all high-frequency processes such that if , where . Then is linearly predictable in the sense of Definition 3(ii). 5. (v)
Let , and let . Let a domain be given such that . Let . Let , , where
[TABLE]
Then the twin set is predictable at any time in the sense of Definition 3(i) but is not linearly predictable in the sense of Definition 3(ii) with .
Example 1(v) implies that any larger class containing is not linearly predictable in the sense of Definition 3(ii).
It can be noted that processes from with a interval spectrum gap at zero feature frequent oscillations (see [1]), and yet Example 1(i) states that this set is linearly predictable in the sense of Definition 3(ii).
III The main result
For , , and , set
[TABLE]
Let be the class of all processes such that
[TABLE]
This class includes processes from such that their Fourier transforms vanish at with the rate defined by .
We consider as a linear normed space with the corresponding norm.
Note that as and that (5) holds for processes with spectrum degeneracy such that is approaching zero as with a sufficient rate of decay. In particular, the class includes all band-limited processes such that there exists such that for , where . However, the spectrum degeneracy for functions from is mild compared with the band-limitiness; in particular, these functions are not necessarily analytic, and their Fourier transform can be non-zero for all .
Example 2
- (i)
For any and , the class is predictable in the sense of Definition 3(i). 2. (ii)
If either or , then the class is not predictable in the sense of Definition 3(i).
Theorems 1 and 2 below give, for the case where , a constructive method of predicting of future averages of the processes descried via convolutions; for example, the values can be predicted for using the observations and these predictors. Moreover, it is shown in Section IV below that this prediction is robust with respect to the noise contamination. These results represent extension of the result [3] on the case of processes with a single point spectrum degeneracy.
Let ,
Theorem 1
Let either or .
- (i)
The class is linearly -predictable in the weak sense such as described in Definition 3(ii).
- (ii)
For any and , the class is linearly -predictable in the weak sense uniformly with respect to the norm such as described in Definition 3(iii).
The predictability stated in Theorem 1 is equivalent to the existence of certain predicting kernels. The required kernels are presented explicitly in the following theorem.
Theorem 2
Let be given and represented as (3) for some given and . Let be given. For and , set
[TABLE]
Then , and, for any sequence , the corresponding sequence of kernels ensures prediction required in Theorem 1 (i)-(ii).
In particular, by the Paley-Wiener Theorem, it follows that for , where . Also, we have that is real valued, since is real valued and , .
Predicting kernels in Theorem 2 represent a modification of the construction introduced in [3] for continuous time processes with the spectrum vanishing on an interval.
Remark 1
Since predicting kernels in Theorem 2 are real valued, it follows that the corresponding processes are real valued if is real valued. This implies that Theorems 1-2 hold with a modification of Definition 3 involving real valued processes , and .
Any particular predictor described in Theorem 2 is not error-free and ensures predictability in an approximate sense only. However, the error can be done arbitrarily small; this can be achieved by selection of a large enough .
The predictors in Theorem 2 do not depend on the polynomial in (3); however, they depend on and in (3).
The rate of spectrum vanishing for predictable processes considered in Theorem 1 is characterized by the pairs . The following proposition shows that the choice of the critical values here is sharp.
IV On robustness of the predictors with respect to noise
contamination
Let us show that the predictors introduced in Theorem 2 and designed for processes from feature some robustness with respect to noise contamination. Suppose that these predictors are applied to a process with a small noise contamination such that , where , and where represents the noise. Let , , and . We assume that and ; we can write this as and and that . . The parameter represents the intensity of the noise.
By the assumptions, the predictors are constructed as in Theorem 2 under the hypothesis that , i.e. that and . By Theorems 1-2, for an arbitrarily small , there exists such that, if the hypothesis that is correct, then
[TABLE]
where and are such as in Definition 3. Let us estimate the prediction error for the case where . We have that
[TABLE]
where
[TABLE]
The value represents the additional error caused by the presence of unexpected high-frequency noise (when ). It follows that
[TABLE]
where .
Therefore, it can be concluded that the prediction is robust with respect to noise contamination for any given . On the other hand, if then and . In this case, error (6) is increasing for any given . Therefore, the error in the presence of noise will be large for a predictor targeting too small a size of the error for the noiseless processes from .
The equations describing the dependence of on could be derived similarly to estimates in [5], Section 6, where it was done for different predicting kernels and for band-limited processes. We leave it for future research.
V Proofs
Proof of Proposition 1. Let us prove statement (i) first. It suffices to show that if and are such that , then .
Suppose that there exists and such that . For , let , , and , be such as described in Definition 3 (i). Since , it follows that for any and any . On the other hand, (4) holds by the assumption on in statement (i). Hence for any . Furthermore, the class contains functions for all ; it follows for these functions that
[TABLE]
The Müntz-Szász Theorem implies that there exits a set such that that the set of finite linear combinations of exponents is complete in , meaning that the set of finite linear combinations of these exponents is everywhere dense in ; see e.g. Crum [2], Sedletskij [21]. It follows that . This completes the proof of Proposition 1(i).
Proof for Example 1. The proof for Examples 1(i-ii) is obvious. The proof for Examples 1(ii-iv) is given in [3].
Let us prove Example 1(v). We have that for a.e. , and that the process is band-limited and hence continuous.
Suppose that for a.e. for some . It would imply that for . Thi is impossible since since and is a band-limited process, it follows that cannot vanish on an open interval; otherwise, it its unique analytic extension would be zero. Therefore, we have proved that the set is predictable at any time in the sense of Definition 3(i),
Let us show that the set is not predictable at any time in the sense of Definition 3(ii).
Let be fixed, and let , . Suppose that there exist kernels required in Definition 3(i) for . Let and .
We have that
[TABLE]
for . Hence
[TABLE]
Let , and let be the set of functions defined on such that ; the inverse Fourier transforms of these functions are such that for .
[TABLE]
where
[TABLE]
where .
Assume that be such that where is a polynomial of order with the roots containing in the set , and where for a non-zero polynomial such that . By the choice of , we have that ; this implies that . By the orthogonality in of the traces of functions from Hardy spaces and respectively, we obtain that
[TABLE]
[TABLE]
It follows from (10) that any choice of cannot ensure that simultaneously for and , which is inconsistent with the supposition that conditions in Definition 3 are satisfied for the set . This completes the proof of Example 1.
It can be noted that both singletons and defined in Example 1(v) are linearly predictable in the sense of Definition 3(ii) with , and yet the twin set is not linearly predictable in this sense.
Proof of Example 2. It is known that if and a.e. then and ; see, e.g. Theorems 11.6 and 11.7 from [9]. This implies that, for any and , . Hence it cannot happen simultainuously that , , and (i.e. for ) . This implies that the class is predictable in the sense of Definition 3(i).
Let us prove Example 2(ii). For any , by the definitions, is the class of such that ; obviously, this class is too wide and cannot be predictable in the sense of Definition 3(i-iii). Therefore, it suffices to consider and only.
Assume that and be given. Consider a filter with the transfer function , where is such that , . Since , we have that . Hence such exists; see, e.g. Theorem 11.6 in [9], p. 193. By the choice of , this filter is causal. Let
[TABLE]
Suppose that the class is linearly predictable in the sense of Definition 3(ii). By the definitions, , hence the class should be also linearly predictable in the sense of Definition 3(ii). On the other hand, consists of processes from transformed by a causal filter. As was mentioned above, the class cannot be linearly predictable. Therefore, the class also is not linearly predictable in the sense of Definition 3. Hence the supposition is incorrect and the class cannot be linearly predictable in this sense for . This completes the proof of Example 2. .
To proceed further, we need to establish some properties of the function .
Let and the corresponding set be given. Let , , let , and let .
Lemma 1
- (i)
* and for any .*
- (ii)
* and for any , , and .*
- (iii)
* as for all .*
- (iv)
For any and , there exists such that for any and .
Proof of Lemma 1. Clearly,
[TABLE]
Hence
[TABLE]
Hence . It also follows that , since each pole at of is being compensated by multiplying on . Then statement (i) follows from the Paley-Wiener theorem.
Further, we have for and that
[TABLE]
Hence
[TABLE]
By the definitions, it follows that
[TABLE]
Hence
[TABLE]
This implies statement (ii).
Further, and as . Hence, by (11), there exists such that, for any , there exists such that
[TABLE]
This and (12) imply statement (iii).
Let us prove statement (iv). Let be selected, and let . For and , we have that
[TABLE]
and
[TABLE]
By the assumptions on , we have that . Hence, for any and , there exists such that for any
[TABLE]
By the choice of , it follows that
[TABLE]
Hence
[TABLE]
This completes the proof of statement (iv) and Lemma 1.
Proof of Theorem 1. Theorem 1 follows immediately from Theorem 2 which proof is given below.
Proof of Theorem 2. Let be given, . Let , and let be the corresponding functions.
Let and . For , let and
[TABLE]
Let . By the definitions, it follows that .
Further, let if and if .
We have that where
[TABLE]
By the assumptions, there exists such that . Hence
[TABLE]
where
[TABLE]
Clearly, as . Further, the measure of the set is . By Lemma 1 (iv),
[TABLE]
as for any and any . It follows that
[TABLE]
Therefore, as .
Let us estimate . We have that
[TABLE]
where
[TABLE]
Here denotes the indicator function.
By Lemma 1(iii), a.e. as . By Lemma 1(ii), for all . Hence
[TABLE]
From Lebesgue Dominance Theorem, it follows that as . It follows that for any and , . By the definition of , we have that . Hence as for any . It follows that the predicting kernels are such as required in statement (i) of Theorem 1. This completes the proof of statement (i).
Let us show that these kernels are such as required in statement (ii) of Theorem 1. Let
[TABLE]
We have that
[TABLE]
for any . It follows from the proofs above that as . Hence (4) holds for the corresponding and . In addition, it follows that the predicting kernels are such as required in statement (ii) of Theorem 1.
Since , and , it follows that and . For this and , the norms in are the same as the norms in . This completes the proof of Theorem 2.
VI Discussion and future research
The present paper is focused on the impact of spectrum degeneracy at a single point for continuous time processes in pathwise deterministic setting. The paper suggests frequency criteria of a linear predictability of anti-causal convolutions and linear predictors described explicitly in the frequency domain. The predictability is feasible for classes of processes with a single point spectrum degeneracity.
- (i)
The family of predictors suggested in Theorem 2 do not depend on the shape of the spectrum of the underlying process. This could be useful for applications. 2. (ii)
The predictors from Theorem 2 are not error-free; however, the error can be made arbitrarily small with a choice of large . In addition, these predictors feature robustness with respect to noise contamination. If the predictor is targeting too small a size of the error, the norm of the transfer function will be large; this could lead to a larger error caused by the presence of noise. 3. (iii)
There is some similarity with a result obtained in [5] for discrete time processes (sequences): they are predictable if their Z-transforms vanish at a point of the unit circle . However, the result [5] was less unexpected since a sequence is band-limited and predictable if its Z-transform vanishes on any arbitrarily small arc on . 4. (iv)
It is still unclear if the linear predictability is feasible for the class with some . 5. (v)
The processes with a interval spectrum gap at zero feature frequent oscillations (sign changes) [1]; it would be interesting to see if the processes from have some similar properties.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Blank and Ulanovskii [2012] Blank N., Ulanovskii A. (2011). Paley–Wiener Functions with a Generalized Spectral Gap. J Fourier Anal Appl 17, 899–915.
- 2Crum [1956] Crum, M.M. (1956). On the theorems of Müntz and Szász. J. of London Mathematical Society. V. s 1-31, Iss. 4, pp. 433-437
- 3[3] Dokuchaev, N. (2008). The predictability of band-limited, high-frequency, and mixed processes in the presence of ideal low-pass filters. Journal of Physics A: Mathematical and Theoretical 41 , No 38, 382002. (7pp).
- 4Dokuchaev [2010] Dokuchaev, N. (2010). Predictability on finite horizon for processes with exponential decrease of energy on higher frequencies. Signal processing 90 (2) (2010) 696–701.
- 5Dokuchaev [2012 a] Dokuchaev, N. (2012). Predictors for discrete time processes with energy decay on higher frequencies. IEEE Transactions on Signal Processing 60 , No. 11, 6027-6030.
- 6Dokuchaev [2012 b] Dokuchaev, N. (2012). On predictors for band-limited and high-frequency time series. Signal Processing 92 , iss. 10, 2571-2575.
- 7Dokuchaev [2016] Dokuchaev, N. (2016). Near-ideal causal smoothing filters for the real sequences. Signal Processing 118 , iss. 1, pp. 285-293.
- 8Dokuchaev [2017] Dokuchaev, N. (2017). On exact and optimal recovering of missing values for sequences. Signal Processing 135 , 81–86.
