Near-ideal predictors and causal filters for discrete time signals
Nikolai Dokuchaev

TL;DR
This paper introduces near-ideal linear predictors and causal filters for discrete time signals with spectrum degeneracy, using polynomial approximations of ideal transfer functions to achieve effective causality.
Contribution
It proposes a novel method for designing causal filters and predictors based on polynomial approximation of non-causal transfer functions for signals with spectrum degeneracy.
Findings
Effective causal filters for spectrum-degenerate signals
Polynomial approximation of transfer functions improves predictor performance
Applicable to a range of discrete time signal processing tasks
Abstract
The paper presents linear predictors and causal filters for discrete time signals featuring some different kinds of spectrum degeneracy. These predictors and filters are based on approximation of ideal non-causal transfer functions by causal transfer functions represented by polynomials of Z-transform of the unit step signal.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsControl Systems and Identification · Sensor Technology and Measurement Systems · Neural Networks and Applications
Near-ideal predictors and causal filters for discrete time signals
Nikolai Dokuchaev
( Submitted: 25.02.2023 ; Submitted: 21.02.2023; revised: 28.03. 2023)
Abstract
The paper presents linear predictors and causal filters for discrete time signals featuring some different kinds of spectrum degeneracy. These predictors and filters are based on approximation of ideal non-causal transfer functions by causal transfer functions represented by polynomials of Z-transform of the unit step signal.
††Accepted to Problems of Information Transmission
%
Key words: discrete time signals, forecasting, predictors, filters, causal transfer functions, causal approximation, high frequency signals, low frequency signals.
1 Introduction
It is well known that certain degeneracy on the spectrum can ensure opportunities for prediction and interpolation of the signals; see, e.g., [1]-[9]. The present paper considers discrete time signals in the deterministic setting, where only a single trajectory of the signal is observed, rather than a set of samples of trajectories that would allow to apply statistical methods. The method that we use is based on the frequency analysis. It is known in principle that these signals are predictable, i.e., they allow unique extrapolations from their past observations, if they have a finite spectrum gap, i.e. a segment of the unit circle , where their Z-transform vanish; see, e.g. [10]. This gap can be arbitrarily small and can be even reduced to a point, under certain conditions on the rate of spectrum degeneracy in a neighbourhood of this point. Respectively, an ideal low-pass filter or high-pass-filter would convert a non-predictable signal to predictable one. This is why these ideal filters cannot be causal.
For discrete time signals, some predictors based on irrational causal transfer functions were obtained in [10, 11]. The corresponding transfer functions were presented via exponentials of rational functions or power functions. In [12], some low-pass filters were also constructed based on a similar principle.
The paper addresses again the prediction and filtering problems for discrete time signals; it offers new predictors and causal filters approximating ideal filters. The causal transfer functions for these predictors and filters are represented as polynomials of Z-transform of the unit step signal, i.e., polynomials of . For the predictors, the corresponding transfer functions approximate the function on , where represents the frequency, and where an integer represents a preselected prediction horizon. For the filters, the corresponding transfer functions approximate the real valued step function representing the trace on of Z-transform of an ideal filter. The approximation is possible for signals with some arbitrarily small spectrum gap; the resulting signal could have a wider preselected spectrum gap. This polynomial approximation method is based on the approach developed in [13, 14] for prediction of continuous time signals.
The results are applicable for high frequency signals as well as for signal for a spectrum gap located anywhere on , for example, low frequency signals. Moreover, the paper shows that some signals with a non-degenerate spectrum also can be predicted in a half of the timeline given some conditions on some spectrum type characteristics of the trace on this half of timeline.
These new predictors and filters allow an explicit representation in the time domain and in the frequency domain; in addition, they are independent on the spectral characteristics of the input signals with given type of the spectrum degenerocity. Some computational approach based on model fitting is suggested.
The paper is organized in the following manner. In Section 2, we formulate the definitions. In Section 3, we formulate the main theorems on predictability and predictors (Theorem 1 and Theorem 2). In Section 4, we discuss representation of transfer functions in the time domain. In Section 5, we discuss some implementation problems. In Section 6, a method of computing approximating functions for exponentials is suggested. In Section 7, we suggest extension of the results on low frequency and other signals. Section 8 contains the proofs.
2 Problem setting
Some notations
Let be the set of all integers.
We denote by the set of all functions (signals) , such that for .
For or , we denote by the Z-transform
[TABLE]
Respectively, the inverse Z-transform is defined as
[TABLE]
If , then is defined as an element of .
We denote by the indicator function.
Some definitions
Let either or .
Let be a set currently observable discrete time signals with values in .
Let be the set of all continuous mappings such that, for any and , we have that for all if for all . In other words, this is the set of ”causal” mappings; we will look for predictors and filters in this class.
Let us consider first a prediction problem. Let an integer be given. The goal is to estimate, at current times , the values , using historical values of the observable process . Therefore, is the prediction horizon in this setting.
Definition 1
Let and .
- (i)
We say that the class is predictable with the prediction horizon up to time if there exists a sequence such that
[TABLE]
where
[TABLE]
- (ii)
We say that the class is uniformly predictable with the prediction horizon up to time if there exists a sequence such that
[TABLE]
where is as in part (i) above.
Functions in the definition above can be considered as approximate predictions of the process .
Let us consider now the filtering problem.
Let be given. Let a function be defined such that .
We consider an ideal high-pass filter such that the trace of its transfer function on is , , i.e., filters with the the suppression interval .
The goal is to find an arbitrarily close approximation of this non-causal transfer function by causal transfer functions.
Definition 2
Let .
- (i)
We say that a class allows causal high-pass filtering with the suppression interval if there is a sequence such that
[TABLE]
where
[TABLE]
- (ii)
We say that the class allows uniform causal high-pass filtering with the suppression interval if there exists a sequence such that
[TABLE]
where and are as in part (i) above.
In the last definition, operators represent causal near-ideal high pass filters; they ensure, for the class , an arbitrarily close approximation of the non-causal ideal high-pass filter defined by its transfer function .
3 The main result
For , let be the set of all functions represented as
[TABLE]
where can be any. Let .
Lemma 1
For , let the function be defined either as or as . Then, for any , there exists a integer and such that
[TABLE]
For , let be the set of all signals such that and for and .
Further, for , let be the set of all real signals such that and the following holds:
- •
If , then
[TABLE]
- •
If , then
[TABLE]
We say that the processes from described above feature will call a left-sided spectrum degeneracy.
The feature of the processes from , , described above, we will call a left-sided spectrum degeneracy. In addition, we define as .
Theorem 1
Let be given, . The predictability up to time for considered in Definition 1(i), as well as the uniform predictability up to time for considered in Definition 1(ii), can be ensured with the sequence of the predictors , defined by their transfer functions selected as in Lemma 1 with . More precisely, for any and , the estimate
[TABLE]
holds if and are such that (2) holds with for sufficiently small .
Theorem 2
For and any , for , the causal filtering considered in Definition 2(i), as well as the uniform causal filtering for considered in Definition2(ii) can be ensured with the sequence of the causal filters , defined by their transfer functions selected as in Lemma 1 with . More precisely, for any and and , the estimate
[TABLE]
if and are such that (2) holds with for sufficiently small .
According to this theorem, a process with an arbitrarily small spectrum gap can be converted, using causal operations, into a process with larger spectrum gap up to .
It van be noted that:
- •
The transfer functions are analytic in the domain . If we apply their traces on for calculation of the outputs for inputs , then we obtain the same outputs as for the functions .
- •
For real valued inputs , the outputs of these predictors and filters are real valued.
- •
depends on and via the coefficients in the setting of Theorem 1, and depends on and via the coefficients in the setting of Theorem 2.
4 Representation of operators in the time domain
Let either or .
Consider operators defined on by their transfer functions , . In other words, if for , then for and . Clearly,
[TABLE]
Hence for all , . Therefore, the Z-transforms of processes vanish on , and the operators are continuous, assuming that is a subspace of provided with -norm.
Let be defined such that for and for , i.e. .
Let and .
Let us show that, in the time domain, the operator can be represented via causal convolution with the kernel , i.e. if then .
Let . Clearly, . Let
[TABLE]
Clearly, for any . Hence
[TABLE]
for each . It follows that if then
[TABLE]
and the series converges for each .
This implies that
[TABLE]
Therefore, the operators in Theorems 1-2 can be represented as
[TABLE]
where
[TABLE]
All series here converge as described above for .
It can be noted that then the series converges absolutely; however, for general type , there is no guarantee that or .
5 On numerical implementation of Theorems 1-2
The direct implementation of the predictors introduced in Theorems 1-2 requires evaluation of sums for semi-infinite series that is not practically feasible. However, these theorems could lead to predicting methods bypassing this calculation. Let us discuss these possibilities.
Let be given such that , where in the setting of Theorem 1 is such as described therein, and in the setting of Theorem 2. Let for , , and let
[TABLE]
Lemma 2
In the notation of Theorems 1-2, for any such that , we have that can be represented as
[TABLE]
Here are the coefficients for from Theorems 1-2,
[TABLE]
[TABLE]
and
[TABLE]
[TABLE]
This lemma shows that calculation of is easy for if we know all , , and observe .
Let us discuss some ways to evaluate bypassing summation of infinite series.
First, let us observe that (6) implies a useful property given below.
Corollary 1
For any , there exist an integer and such that, for any , there exist such that for all , where
[TABLE]
In this corollary, and are such as defined in Theorems 1-2.
The case of prediction problem: Theorem 1 setting
Let us discuss using (6) and (7) for evaluation of in Theorem 1 setting.
Let and . Assume first that the goal is to forecast the value given observations at times , in the setting of Theorems 1. It appears that if then Corollary 1 gives an opportunity to construct predictors via fitting parameters using past observations available for : we can match the values with the past observations . Starting from now, we assume that .
Let be large enough such that is approximated by as described in Theorem 1, i.e., for some sufficiently small , for some choice of .
As an approximation of the true , we can accept a set such that
[TABLE]
(Remind that, at time , values and are observable for these ). If (8) holds, we can conclude that delivers an acceptable prediction of for these . Clearly, Theorem 1 implies that a set ensuring (8) exists since this inequality holds with .
The corresponding value would give an estimate for and, respectively, for .
Furthermore, finding a set that ensures (8) could still be difficult. Instead, one can consider fitting predictions and observations at a finite number of points .
Let a integer and a set be selected such that . We suggest to use observations at times . Consider a system of equations
[TABLE]
Consider first the case where . In this case, we can select ; these values are directly observable, without calculation of semi-infinite series required for . The corresponding choice of ensures zero prediction error for , .
Including into consideration more observations, i.e., selecting larger and larger set , would improve estimation of . If we consider , then, in the general case, it would not be feasible to achieve that for all , since it cannot be guaranteed that system (9) is solvable for : the system will be overdefined. Nevertheless, estimate presented in (8) can still be achieved for any arbitrarily large , since (8) holds. A solution could be found using methods for fitting linear models.
So far, the consistency of these procedures is unclear since a choice of smaller leads to larger . We leave analysis of these methods for the future research.
The case of causal filtering problem: Theorem 2 setting
In the setting of Theorem 2, the past values of the true unknown process are not observable and hence cannot be used for fitting the values . However, we can use that the values in (6)-(7) are still the same as in the setting of Theorem 2, where . Since past are observable, we can use the fitting procedure based on Theorem 1 to estimate using (6)-(7) with the coefficients defined for approximation of and with observations , , as described above. After that, we can estimate using equation (6) again with the new coefficients defined for approximation of .
6 A possible choice of for predictors in Theorem 1 setting
The coefficients for functions could be found use numerical methods from classical analysis such as the Gram-Schmidt procedure. In the case of Theorem 1 for predictors, finding these coefficients can be simplified, especially for .
Let us demonstrate this.
Assume that . For real , define on a function
[TABLE]
This function is a modification of the transfer function introduced in [10] for prediction of signals with a single point spectrum degeneracy. Clearly,
[TABLE]
uniformly on the set . Hence
[TABLE]
uniformly on the set .
Further, for , let be selected such that
[TABLE]
The function is analytic in , and is bounded on for any . Clearly, we have that
[TABLE]
where
[TABLE]
and where convergence is uniform on the set .
It can be observed that the functions belong to , since
[TABLE]
For example,
[TABLE]
Clearly, we can select such that
[TABLE]
For this and , we have that
[TABLE]
The coefficients can be computed form the representation of as an element of .
For the case of , one can use functions .
7 Low frequency and other signals
Let us show that the results obtained above for high frequency signals can be applied to signals of more general type described as follows.
Let , , and be given, and let be the set of all signals such that for , and .
For example, ; this set includes high frequency signals such that if . Respectively, the set includes low frequency signals (band limited signals) such that if .
To predict a signal , one can convert it into a signal as . Then one can use for the predictors introduced in Theorem 1. The implied prediction for can be obtained as , where is the corresponding prediction for .
Similarly, one can construct a causal filter that, for , produces an approximation of such that , where , and is Z-transform of an ideal filter such that . Again, one can convert it into a signal as . Then one can use for the causal filter introduced in Theorem 2. The implied filtered signal for can be obtained as , where is the corresponding filtered signal for .
Alternatively, we can construct predictors and filters directly for signals from similarly to the ones introduced in Theorems 1-2 and with the transfer functions
[TABLE]
approximating and on .
In the setting where , and where is unknown, we can use approach from Section 5 to fit from past observations as a new unknown parameter.
8 Proofs
Proof of Lemma 1. For a set , let (or ) be the set of functions constructed as for some from (or from , respectively)).
Let .
Clearly, for all , . Hence
[TABLE]
It follows that, if then , for both and . This implies that if .
Since the function is strictly monotone on the intervals and , and has different signs on these two intervals, it follows that for all , . It follows that the set of function separates points on the compact set . By the Stone-Weierstrass Theorem for complex valued continuous functions on compact sets of real numbers, it follows that the set is complete in the space of continuous complex-valued functions defined on with the supremum norm; see, e.g., Theorem 10 in [15], pp. 238. It follows that, for any , there exists and represented as defined for , where , such that
[TABLE]
For this follows directly from Theorem 10 in [15], pp. 238, mentioned above. For this follows from the fact that the set is everywhere dense in , and convergence in implies convergence in .
Let us show that the same estimate holds for defined as , where .
Suppose that for some . Clearly, the real and the the imaginary part of are even and odd, respectively. On the other hand, the functions and are odd and even, respectively, on . Therefore, the replacement of by cannot spoil the estimate. Hence the transfer function satisfies the required estimate. This completes the prove of Lemma 1.
Proof of Theorems 1-2. Let us consider first the case where .
We continue with the proof for Theorems 1 (with and Theorem 2 simultaneously. For the proof of Theorem 1, we assume that and is defined as . For the proof of Theorem 2, we assume that as .
Assume that estimate (2) holds for selected . We have that
[TABLE]
where in the setting of Theorem 1, and is an ideal filtered process in the setting of Theorem 2. Clearly,
[TABLE]
We have that
[TABLE]
Hence . This implies the proofs of Theorems 1 for the case where and Theorem 2.
Let us prove Theorem 1 for the case where . Let . Let us define an even function such that for , and for . Let . It can be shown that and for . This implies that . Furthermore, since predictors are causal, it follows that for all . Then the proof for follows from the proof the case of .
The case where can be considered similarly. For , we define an odd function such that for , , and for . Let . It can be shown that and for . It follows that . Again, since predictors the are causal, it follows that for all . Hence the proof for follows from the proof for the case of . This completes the proofs of Theorems 1.
Proof of Lemma 2. We have that
[TABLE]
Futrher, we have that for any , i.e.,
[TABLE]
Here we assume that .
Furthermore, we have that
[TABLE]
and
[TABLE]
Similarly,
[TABLE]
Similarly, we obtain that, for ,
[TABLE]
It follows that
[TABLE]
Together with (10), this proves (7) and completes the proof of Lemma 2.
9 Concluding remarks
- i.
The approach suggested in this paper allows many modifications. In particular, other non-causal discrete time transfer functions can be approximated by causal transfer functions from . In fact, any transfer function can be approximated that way if . 2. ii.
It can be shown that, by Theorem 10 in [15], pp. 238 again, approximation of in Lemma 1 can be in fact achieved on the set of real valued functions represented as
[TABLE]
with . This may help to streamline calculations since this set is smaller than . If are found, then we can derive the coefficients needed for the fitting of via (6)-(7). 3. iii.
The predictors introduced in [10, 11] do not allow the fitting procedure described in Section 5 since the kernels of the corresponding causal convolutions are heavily time dependent. 4. iv.
In the present paper, we consider -approximation of non-causal transfer functions; this allowed to approximate discontinuous on transfer functions used for the filtering problem. In addition, this would allow to use the Gram-Schmidt procedure to construct the functions . This was not feasible in the continuous time setting [14], where the uniform approximation on the infinite intervals was required. 5. v.
In general, it can be expected that the approximating functions take large values for large inside the interval , in the terms of Lemma 1. However, some robustness of the prediction and filtering with respect to noise contamination can be established similarly to [10]. We leave it for the future research. 6. vi.
The processes from do not necessarily have a spectrum degeneracy for and ; in fact, their Z-transforms can be separated from zero on . However, Theorem 1 shows that they are predictable on the left half of the timeline because of their left-sided spectrum degeneracy defined by (3),(4).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Butzer, P.L., Stens, R.L. (1993). Linear Prediction by Samples from the Past. In: Marks, R.J. (eds) Advanced Topics in Shannon Sampling and Interpolation Theory. Springer, New York, NY.
- 2[2] Higgins, J.R. (1996). Sampling Theory in Fourier and Signal Analysis . Oxford University Press, New York.
- 3[3] Li, Z., Han, J., Song, Yu. J. (2020). On the forecasting of high-frequency financial time series based on ARIMA model improved by deep learning. J. of Forecasting 39(7), 1081–1097
- 4[4] Luo, S., Tian, C, (2020). Financial high-frequency time series forecasting based on sub-step grid search long short-term memory network, IEEE Access , Vol. 8, 203183 - 203189.
- 5[5] Knab J.J. (1979). Interpolation of band-limited functions using the approximate prolate series. IEEE Transactions on Information Theory 25 (6), 717–720.
- 6[6] Lyman R.J, Edmonson, W.W., Mc Cullough S., and Rao M. (2000). The predictability of continuous-time, bandlimited processes. IEEE Transactions on Signal Processing 48 (2), 311–316.
- 7[7] Lyman R.J and Edmonson, W.W. (2001). Linear prediction of bandlimited processes with flat spectral densities. IEEE Transactions on Signal Processing 49 (7), 1564–1569.
- 8[8] Papoulis A. (1985). A note on the predictability of band-limited processes. Proceedings of the IEEE 73 (8), 1332–1333.
