Stochastic Approximation in a Markovian Framework Revisited: Lipschitz Continuity of the Poisson Equation
Algo Car\`e, Bal\'azs Csan\'ad Cs\'aji, Bal\'azs Gerencs\'er, L\'aszl\'o Gerencs\'er, Mikl\'os R\'asonyi

TL;DR
This paper revisits a key technical aspect of stochastic approximation in Markovian settings, providing simple conditions to ensure the Lipschitz continuity of solutions to the Poisson equation, which is crucial for analyzing algorithms in various applications.
Contribution
It introduces straightforward conditions to verify the existence, uniqueness, and Lipschitz continuity of solutions to the parameter-dependent Poisson equation in Markovian frameworks.
Findings
Conditions verified for a class of queuing systems with open-loop control.
Established Lipschitz continuity of the Poisson equation solutions.
Simplified technical verification in stochastic approximation analysis.
Abstract
In this paper we revisit a fundamental technical issue within the theory of stochastic approximation (SA) in a Markovian framework, first proposed in the book by Djereveckii and Fradkov (1981), and further developed in much detail in the book by Benveniste, M{\'e}tivier, and Priouret (1990). This theory is instrumental in many application areas such as the statistical analysis of Hidden Markov Models arising in telecommunication, quantized linear stochastic systems, and more recently in active learning and reinforcement learning. The problem at hand is the verification of the existence, uniqueness and Lipschitz-continuity of the solution of a parameter-dependent Poisson equation, in an appropriate weighted sup-norm, associated with a collection of Markov chains on general state spaces. Verification of the above facts is vital in the analysis of SA processes presented in (Benveniste et…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Queuing Theory Analysis
Poisson Equations, Lipschitz Continuity and Controlled Queues
Algo Carè1
Balázs Csanád Csáji2,4
Balázs Gerencsér3,4
László Gerencsér2
Miklós Rásonyi3,4 ∗A. Carè and B. Cs. Csáji were (partially) supported by the European Commission through the H2020 project Centre of Excellence in Production Informatics and Control (EPIC, 739592). B. Cs. Csáji and L. Gerencsér were supported by the European Union within the framework of the National Laboratory for Autonomous Systems (RRF-2.3.1-21-2022-00002). M. Rásonyi and B. Gerencsér were supported by NRDI (National Research, Development and Innovation Office) grant KKP 137490. B. Gerencsér was also supported by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences.1A. Carè is with Dipartimento di Ingegneria dell’Informazione, Università di Brescia, 25123, Brescia, Italy, [email protected]2L. Gerencsér and B. Cs. Csáji are with the Institute for Computer Science and Control (SZTAKI), Eötvös Loránd Research Network (ELKH), Kende utca 13-17, H-1111, Budapest, Hungary, gerencser. [email protected], [email protected]3B. Gerencsér and M. Rásonyi are with the Alfréd Rényi Institute of Mathematics, Eötvös Loránd Research Network (ELKH), Reáltanoda u. 13-15., H-1053, Budapest, Hungary, [email protected], [email protected]4B. Cs. Csáji, B. Gerencsér and M. Rásonyi are also with the Institute of Mathematics, Eötvös Loránd University (ELTE), Budapest, Hungary
Abstract
The objective of the paper is to revisit a key mathematical technology within the theory of stochastic approximation in a Markovian framework, elaborated in detail by Benveniste, Métivier, and Priouret (1990): the existence, uniqueness and Lipschitz continuity of the solutions of a parameter-dependent Poisson equation associated with a collection of Markov chains on general state spaces. The setup and the methodology of our investigation is based on an elegant stability theory for Markov chains, developed by Hairer and Mattingly (2011). The paper provides a transparent analysis of parameter-dependent Poisson equations with convenient conditions. The validity of the proposed conditions is verified for a class of controlled queues.
I Introduction
A beautiful area of systems and control theory is recursive identification, and stochastic adaptive control of stochastic systems. In an abstract mathematical framework [2] [12] the key problem is to solve a non-linear algebraic equation
[TABLE]
where is an unknown, vector-valued parameter of a physical plant or controller, is a strictly stationary stochastic process, representing a physical signal affected by and is a computable function. The same mathematical framework is applied in other fields such as adaptive signal processing and machine learning.
Our objective is to find the root of (1), denoted by via a recursive algorithm based on computable approximations of In the case when where is an i.i.d. process, or a martingale difference sequence, we get a classical stochastic approximation process.
An early version of the above problem is presented in the celebrated paper by Ljung [11], in which was assumed to be defined via a linear stochastic system driven by a weakly dependent process.
A renewed interest in recursive estimation in a Markovian framework was sparked by the excellent book of Benveniste, Métivier and Priouret [2] elaborating an extensive mathematical technology for the analysis of these processes. A central tool in their analysis is a complex set of results concerning the parameter-dependent Poisson equation. This is carried out by a specific stability theory for a class of Markov processes, which is off the track of usual methodologies, e.g., Athreya and Ney [1], Nummelin [15], Meyn and Tweedie [14].
The enormous practical value of the estimation problem in a Markovian framework motivates our interest to revisit the theory of [2], and see if their analysis can be simplified or even extended in the light of recent progress in the theory of Markov processes. blueThe starting point of our investigation is a relatively new, elegant stability theory for Markov processes developed by Hairer and Mattingly [7].
The focus of the present paper is the study of the parameter-dependent Poisson equation formulated as
[TABLE]
where is the probability transition kernel of the Markov process with denoting the action of on the unknown function and is an a priori given function defined on the state-space of the process, finally denotes the mean value of under the assumed unique invariant measure, say , corresponding to
The Poisson equation is a simple and effective tool to study additive functionals on Markov processes of the form
[TABLE]
via martingale techniques. Proving the Lipschitz continuity of w.r.t. , and providing useful upper bounds for the Lipschitz constants are vital technical tools for an ODE analysis proposed in [2, Part II, Chapter 2]. In fact, the analysis of the Poisson equation takes up more than half of the efforts in proving the basic convergence results in [2], and the verification of their conditions, in particular a kind of Lipschitz continuity of the probability transition kernels formulated in terms of second order differences, see Theorem 6, condition (iv) on page 262 of [2], is far from being trivial.
The objective of our project is to revisit the relevant mathematical technologies and outline a transparent and flexible analysis within the setup of [7]. The present paper is devoted to the first half of this project, the analysis of a parameter-dependent Poisson equation. The application of our results for stochastic approximation within a Markovian framework is the subject of a forthcoming paper, in which a combination of the ODE analysis developed in [2] and [6] is to be extended using the results of the current paper. In the end we get the expected rate of convergence for the moments of the estimation error under a convenient set of conditions.
The significance of the topic of the paper is reinforced by the current intense interest in the minimization of functions computed via MCMC [4]. To complement the above historical perspective we should note that the problem goes back to [16], providing results for finite state Markov chains. The extension to more general state spaces is far from trivial, posing the challenge to choose an appropriate distance of measures.
The structure of the paper is as follows: in Section II we provide an introduction to the stability theory for Markov chains developed in [7]. The main results of the paper are stated in Section III, culminating in Theorem 2, proving the Lipschitz continuity of the solutions of a parameter-dependent Poisson equation. These results are extended in Section IV, in particular, the uniform drift condition, stated as Assumption 1, is significantly relaxed. The applicability of our results for controlled queuing systems is presented in Section V. The paper is concluded with a brief discussion.
Given the highly technical nature of the paper we think that the present structure, introducing the relevant concepts, theorems and their proofs incrementally, enhances clarity.
II A Brief Summary of
a New Stability Theory for Markov Chains
Let be a measurable space and be a domain (i.e., a connected open set). Consider a class of Markov transition kernels , that is for each , , is a probability measure over , and for each , is -measurable. Let , , be a Markov chain with transition kernel . For any probability measure and measurable : define
[TABLE]
assuming the integral exists. The next condition is motivated by [7], stated there for single Markov-chains.
Assumption 1** (Uniform Drift Condition for ).**
There exists a measurable function and constants and such that
[TABLE]
for all and . is often called a Lyapunov function in the literature. Note that is not -dependent.
Remark 1**.**
Inequality (4) without requiring will be called uniform one-step growth condition. This condition implies that for any measure such that and we have for all
[TABLE]
Indeed, integrating (4) with respect to we get (5). Moreover, for any signed measure with , we have
[TABLE]
for all due to the inequality .
Example. The prime targeted area for application is controlled queuing system, where both the arrival and service processes may be subject to control, such as Call Admission Control, see e.g., Chapter 11 of [10]. Possible objective of control actions, such as call rejection, may be to optimize a criterion such as the total waiting time. In order to demonstrate the applicability of the results of the present paper, we restrict attention to a single server, for which both the arrival and service process may be subject to control.
Let us first describe the dynamics of a queue without control. Let the arrival process be identified by a simple point process with and let be the (finite) time elapsed between the arrivals of customers and i.e. Let be the service time of customer It is assumed that the sequences and , , are i.i.d. sequences of -valued random variables, respectively, independent of each other. We define , . The waiting time of the -th customer will be denoted by It is readily seen that it satisfies the recursion, with as initial value, and with :
[TABLE]
In the case of controlled queues both the service time and the arrival time may depend on a control parameter . Let be a connected, open set as above, and let be a compact set such that The choice of the control parameter may determine the law of the service times and that of the arrival times via actions such as call rejection. Then the dynamics of the queue can be described, with and by
[TABLE]
If the initial condition is , then the waiting time at will be denoted by To guarantee stability of the queue we have to assume
[TABLE]
for all This is a standard condition implying stability of the queue for any fixed in a variety of interpretations, see e.g., [13, 3] and [5]
The validity of the drift condition, given as Assumption 1, with no parameter-dependence, has been established under appropriate technical conditions, using the Lyapunov function , with small enough, in Section 16.4 of [14]. A uniform version of this result will be established in Section V.
The next condition is a natural extension of the corresponding assumption of [7] for a parametric family of Markov chains, which itself is a modification of a standard condition in the stability theory of Markov chains [14].
Assumption 2** (Local Minorization).**
Let , where and are the constants from Assumption 1, and set . There exist a probability measure on and a constant such that, for all , all , and all measurable ,
[TABLE]
Remark 2** (Interpretation of ).**
If there exists an invariant measure such that then integrating both sides of inequality (4), we get
[TABLE]
Thus, in Assumption 2 exceeds twice the mean of w.r.t. any of the invariant measures.
Remark 3** (Constant Shifts of ).**
We can and will assume that without loss of generality in view of the following reasoning. If a function such that satisfies Assumption 1, then for where is a constant with , it holds:
[TABLE]
hence, also satisfies Assumption 1 with the same and with replacing constant .
To assess the effect of replacing by on Assumption 2 note that the condition becomes
[TABLE]
On the other hand, the set on which the local minimization is required can be written as
[TABLE]
which is exactly if , that is . It remains to show that this choice of satisfies In other words , which is trivially satisfied if , in particular with .
We now introduce a weighted total variation distance between two probability measures , where the weighting is in the form , where for which a fine-tuned choice will be needed for the results of [7] to hold.
Definition 1**.**
Let and be two probability measures on . Then, define the weighted total variation distance as
[TABLE]
where is the total variation measure of .
The definition naturally extends to any pair of possibly signed measures such that for assuming that Writing we can define the weighted total variation norm (as a single variable function)
[TABLE]
An equivalent definition of can be given by introducing the following norm in the space of -valued functions on :
Definition 2**.**
For any measurable function : , set
[TABLE]
The linear space of functions such that will be denoted by . Note that is neither affected by constant shifts of , nor the choice of ; morover, with the norm becomes a Banach space for any An equivalent definition of is:
[TABLE]
Equivalently, we can write with
[TABLE]
We will also need a concept mimicking weighted Lipschitz continuity of a function Since in our setup is not a metric space, first of all we introduce a metric as follows. Denoting by the Dirac measure at , note that, for , it holds that . This leads to the definition of the following metric on :
[TABLE]
This may seem to be an unusual metric, assigning a distance at least between any pair of distinct points, but it turns out to be quite useful. Having a metric on , we will introduce a a measure of oscillation for functions as follows:
Definition 3**.**
For any measurable function : , set
[TABLE]
It is easily seen that is finite on and in fact we have the following inequality: Indeed
[TABLE]
Since is invariant w.r.t. translation by any constant we also get Surprisingly, the minimum of these upper bounds reproduces as stated in the following lemma given in [7] in a slightly weaker form with “” replacing “”. However, the proof in [7] explicitly confirms the stronger statement below:
Lemma 1**.**
.
It is readily seen that is a semi-norm on and if and only if is a constant function. Letting denote the linear vector-space of constant functions on it follows that is a norm on the linear factor-space, It is also easily seen that becomes a Banach space with the norm In what follows, will denote the latter Banach space.
A useful linear subspace of the dual space is obtained by considering the linear space of signed measures such that
[TABLE]
which will be denoted by It is easily seen that
[TABLE]
is a continuous linear functional the dual norm of which is
[TABLE]
This observation motivates the following definition:
Definition 4**.**
Let be two possibly signed measures on such that for moreover we have Then, we define the distance
[TABLE]
A simple corollary of Lemma 1 is the following:
Corollary 1**.**
Let be two possibly signed measures on as in Definition 4. Then
[TABLE]
Thus we have the following equivalent expressions for :
[TABLE]
Proof.
Indeed, we immediately get On the other hand, take and let be such that and
[TABLE]
By Lemma 1 there exists a constant such that . Thus, , therefore
[TABLE]
Since is arbitrary, we get that Combining with the opposite inequality, we get the claim. ∎
Remark 4**.**
A useful reformulation of the above result is that for any signed measure , we have
[TABLE]
A fundamental result of [7, Theorem 3.1] is as follows:
Proposition 1**.**
Under Assumptions 1 and 2, there exists and such that for all and measurable ,
[TABLE]
The pairs can be chosen as follows: take and and then set
[TABLE]
Remark 5**.**
Although there is a freedom in choosing and the resulting contraction coefficient is bounded from below: it holds that In other words, the contraction coefficient ensured by Proposition 1 is strictly larger than the contraction coefficient in the drift condition, cf. Assumption 1. In fact, we have, using ,
[TABLE]
Since by construction, the statement follows.
Proposition 1 can be restated as saying that is a contraction on the Banach space But then its adjoint operator , having the same norm, is also a contraction. Thus we immediately get the following result, stated essentially in [7, Theorem 1.3]:
Proposition 2**.**
Under the assumptions of Proposition 1 there exist and , such that for all , and any signed measure we have
[TABLE]
Alternatively, let be two possibly signed measures on as in Definition 4. Then, we have
[TABLE]
In what follows, and are chosen as indicated in Proposition 1. Using standard arguments one can easily show the following proposition, also stated in [7] as Theorem 3.2:
Proposition 3**.**
Under Assumptions 1 and 2 for all there is a unique probability measure on such that and
Similar results to those of Propositions 2 and 3 are stated in [14, Theorem 14.0.1] under slightly different conditions. In particular, the special choice of the parameter in the weighting function is not part of the conditions in [14] at the price that the contraction of the one-step kernel is not stated. In addition, in [14] it is a priori assumed that the Markov-chain is -irreducible and aperiodic, while in [7] these conditions are circumvented by assuming that the minorization condition holds on a fairly large set, defined in terms of a sublevel set of see Assumption 2. The analysis of the causal connection between these two sets of conditions is certainly of interest for future research.
III Lipschitz Continuity w.r.t. of the
Solution of a -Dependent Poisson Equation
In this section we shall consider the Poisson equations
[TABLE]
for , where , are the input data, and is a solution. First, we prove the existence and the uniqueness (up to an additive constant) of the solution for a fixed , adapting standard arguments, then we formulate smoothness conditions on the kernel , and the right hand side, . Using these conditions we prove Lipschitz continuity w.r.t. in the norm of the particular solution for which . For a start let be fixed.
Theorem 1**.**
Let Assumptions 1 and 2 hold. Let be a measurable function such that and let for some fixed , with invariant measure . Let . Then, the Poisson equation
[TABLE]
has a unique solution up to an additive constant. Henceforth, we shall consider the particular solution
[TABLE]
This is well-defined, in fact the right hand side is absolutely convergent, and in addition . Furthermore,
[TABLE]
implying the inequality
[TABLE]
where is given by
[TABLE]
Proof.
It is immediate to check that (35) is formally satisfied by . We show that is well-defined. First, consider any function such that . By the definition of the metric , see (25), the inequality
[TABLE]
holds true for any pair of probability measures or even for any pair of signed measures as in Definition 4. On the other hand, any generic function can be rescaled by , so that we also have
[TABLE]
To estimate the th term of the right hand side of (36), consider the equalities
[TABLE]
Using (41), we can bound the right hand side by . Now applying Proposition 2 and taking into account Corollary 1, we can further bound it by
[TABLE]
Taking into account the trivial estimate
[TABLE]
and noting that implies for all that , we conclude that
[TABLE]
It follows that the series is absolutely convergent, so is well-defined and satisfies the desired upper bound. Indeed, can be written as
[TABLE]
where the integration and the summation can be interchanged due to the Lebesgue dominated convergence theorem, the conditions of which are ensured by (45). Thus, we get
[TABLE]
which implies the claim. Using similar arguments we get that
[TABLE]
To prove uniqueness, assume that there are two solutions and , and define . Then, , implying , from which . But, by Proposition 1, it holds that and hence . Therefore, is a constant.
Summing the inequalities (45) over and using (11) we get the upper-bound
[TABLE]
from which the claim of the theorem follows. ∎
Now we consider a parametric family of kernels and that of functions for A critical point in the discussion to follow is to define appropriate smoothness conditions for them in the context of [7].
Assumption 3** (Lipschitz Continuity of ).**
There exists a constant such that for every and all it holds that
[TABLE]
It is easily seen that the validity of Assumption 3 implies an equality similar to (50) for general (non-negative) measures, even under a relaxed drift condition:
Lemma 2**.**
Let be a measure such that and assume that the one-step growth condition, defined by Assumption 1 without requiring holds. Then under Assumption 3 we have for every
[TABLE]
Proof.
Assumption 3 implies that for all such that , and hence also ,
[TABLE]
Integrating this inequality with respect to on the right hand side of (52) we get the right hand side of (51). For integral of the left hand side we apply Fubini’s theorem to get
[TABLE]
where the measure is defined as usual by . The measure is well-defined, since . The applicability of Fubini’s theorem is justified by the inequality
[TABLE]
and noting that the right hand side has a finite integral with respect to . Using the same argument for , altogether for the integral of (52) we obtain
[TABLE]
Since is arbitrary subject to , we conclude that is bounded by the right hand side of (52), and we get the statement of the Lemma. ∎
The above lemma is easily extended from measures to signed measures:
Lemma 3**.**
Let be a signed measure such that and assume that the one-step growth condition, defined by Assumption 1 without requiring holds. Then under Assumption 3 we have for every
[TABLE]
Proof.
We consider the Hahn-Jordan decomposition , where and are non-negative measures. Then
[TABLE]
Using Lemma 2 for both terms we get the desired upper bound:
[TABLE]
which is the right hand side of (54). ∎
The class of measurable functions is determined by the following assumption:
Assumption 4**.**
We have , and there exists a constant such that, for all , it holds that
[TABLE]
The main result of the present paper is as follows, with the remainder of this section being devoted to its proof:
Theorem 2**.**
Let Assumptions 1, 2, 3 and 4 hold, and consider the parameter-dependent Poisson equation
[TABLE]
where . Then, is Lipschitz continuous in :
[TABLE]
and the family of solutions , ensured by Theorem 1, is Lipschitz continuous in :
[TABLE]
where is independent of . It follows that
[TABLE]
Briefly speaking we can say that as an operator mapping the space of functions satisfying Assumption 4, such that into is Lipschitz continuous.
Proof.
Consider the extended parametric family of Poisson equations, where and are independently parametrized, with the notation
[TABLE]
First, we prove that is Lipschitz continuous in and Since , the Lipschitz continuity of stated in (58) then follows. We can write
[TABLE]
Note that the limits of the right hand side are finite by Assumption 4 and the drift condition Assumption 1.
We can bound the right hand side of (62) as follows:
[TABLE]
[TABLE]
Using the Lipschitz continuity of , as given by Assumption 4, the right hand side can bounded from above by
[TABLE]
Letting , for the limit we get, using Remark 2,
[TABLE]
To continue the proof of the theorem we will have to establish the Lipschitz continuity of the powers of the kernel together with an upper bound for the Lipschitz constants.
Lemma 4**.**
Assume that Assumptions 1, 2, and 3 hold. Then for all and signed measure with ,
[TABLE]
where is independent of , and , given by
[TABLE]
Proof.
We can estimate from above, using a kind of telescopic sequence of triangular inequalities, leading to the upper bound
[TABLE]
Note that the measures and are as in Definition 4, in particular . Then, using the contraction property of the kernels , see Proposition 2, we obtain the upper bound
[TABLE]
For the -th term apply Lemma 3 with to get the following upper bound for (70):
[TABLE]
Note now that by the consequence of the drift condition given in Remark 1 we can bound for a general by
[TABLE]
Noting that and iterating the above inequality, we get
[TABLE]
By plugging (73) into the sum in (71), we get the upper bound
[TABLE]
We can write the latter expression as
[TABLE]
Summarizing the inequalities (III) to (74), taking into account (see Remark 5), and bounding the geometric sums in (74) with their limit values we get the upper bound
[TABLE]
from which the claim follows by setting . ∎
Applying Lemma 4 for , we get
[TABLE]
from which we get the Lipschitz continuity of the invariant measure as a function of :
Corollary 2**.**
Under the assumptions of Lemma 4, we have
[TABLE]
where
[TABLE]
Proof.
Note that for any initial probability measure , we have by the triangle inequality
[TABLE]
The first and the last terms converge to zero by Proposition 2. Taking , the middle term is upper bounded by
[TABLE]
based on (67). Taking into account the definition of based on (75), this time taking , we get . Finally, the claim follows by recalling that by Remark 3. ∎
Returning to the proof of Theorem 2, the left hand side of (63) can be written as
[TABLE]
Here by Assumption 4 and can be upper bounded by Corollary 2. Setting and combining the bounds given by (66) and (80) we get the desired inequality (58), with
[TABLE]
Next, we consider the Lipschitz continuity of the doubly-parametrized particular solution
[TABLE]
We will need an alternative of Lemma 4 for signed measures (i.e., in addition to ):
Lemma 5**.**
Assume that Assumptions 1, 2, and 3 hold. Then for every and signed measure , we have
[TABLE]
Proof.
The starting point of the proof is the inequality, obtained by combining the inequalities (III) – (70), applicable also for signed meausures such that :
[TABLE]
A key point is the observation that since , converges exponentially fast to the zero measure, see Proposition 2. To estimate the th term of (84), we apply Lemma 3 and Remark 4, (30),
[TABLE]
Now applying Proposition 2 and Remark 4, (30), again, we get the upper bound:
[TABLE]
Inserting this into (84), we get the desired upper bound. ∎
Step 1. First we show that is Lipschitz continuous in Indeed, we have
[TABLE]
Here the -th term can be written, using (41), as
[TABLE]
Taking into account Proposition 2 and Assumption 4 the right hand side can be bounded from above by
[TABLE]
Inserting this into (III) gives
[TABLE]
Step 2. The critical point is to show that is Lipschitz continuous in Let us write
[TABLE]
The -th term can be written as
[TABLE]
Write the measure in the bracket as
[TABLE]
Then for we get by Lemma 5 with the upper bound
[TABLE]
On the other hand, for we have by Proposition 2 the upper bound and this can be bounded from above by Corollary 2, yielding
[TABLE]
Thus the -th term of (III), rewritten in (91), can be rewritten and bounded from above, using inequality (41), as
[TABLE]
Summation over in view of (III), yields the upper bound
[TABLE]
The right hand side can be simplified to
[TABLE]
Combining this with (89), and setting we get the upped bound for :
[TABLE]
The latter can be simplified to
[TABLE]
where
[TABLE]
To get a compact upper bound for (97), we replace by noting that , and multiply the second term by . Then, we get the upper bound
[TABLE]
proving the second claim of the theorem:
[TABLE]
where the constant can be chosen as
[TABLE]
∎
Remark 6**.**
Assume that the assumptions of Lemma 4 are satisfied. Then, for any measurable functions such that , is upper-bounded by
[TABLE]
for all , where is given in Lemma 4, see (68).
IV Relaxations of the Conditions
A delicate condition of Propositions 1-3 is Assumption 1, requiring the existence of a common Lyapunov function. This requirement may be too restrictive even in the case of linear stochastic systems, to be discussed in the example below.
Example. Consider a family of linear stochastic systems with state vectors defined by
[TABLE]
where , the matrix is stable for all , is an i.i.d. sequence of random vectors such that and exists and is finite, and is a matrix with appropriate dimensions. Setting , where is a symmetric positive definite matrix, we have
[TABLE]
thus the drift condition is equivalent to with for all , in the sense of semi-definite ordering. Hence, the matrix induces a metric with respect to which is a contraction, simultaneously for all , with the same contraction factor. It follows that the family is jointly stable.
Let us now assume only that , with is a compact set of stable matrices. Then we can find a positive integer such that for all hence the family of matrices is jointly stable. This example motivates the following relaxation of the drift condition given as Assumption 1 in analogy with Assumption A’.5, (i) and (i’) on page 290 of [2]:
Assumption 5** (Uniform Drift Condition for ).**
There exists a positive integer , a measurable function and constants and such that for all and we have
[TABLE]
Assumption 6** (Uniform One Step Growth Condition for ).**
With the same measurable function as above we have for all and all
[TABLE]
where we can and will assume that and
Lemma 6**.**
The uniform one-step growth condition given above implies that for any for all functions and all with we have
[TABLE]
Proof.
In order to simplify the notations we write We have from which we get
[TABLE]
by Assumption 6. The last term on the right hand side is majorized by with proving the first half of (108). To prove the second half of (108) recall that for any we have Hence for any constant we have
[TABLE]
Apply the first inequality of (108) with replacing
[TABLE]
Choosing so that yields the claim. ∎
Lemma 6 is a relaxed version of Proposition 1. Now, repeating the arguments leading to Proposition 2, we get:
Lemma 7**.**
Under Assumption 6, for any for all , the kernel is a bounded linear operator on , more exactly, for any , we have, with as in Lemma 6,
[TABLE]
Alternatively, let be two possibly signed measures on as in Definition 4. Then we have
[TABLE]
Assumption 7** **(Uniform Local Minorization for
).
Let where and are the constants from Assumption 5, and let . There exist a probability measure and a constant such that for all , and measurable it holds:
[TABLE]
The main results cited in Section II can be extended, with minor modifications, assuming the above relaxed conditions. Proposition 1 can be restated as follows:
Theorem 3**.**
Under Assumptions 5, 6 and 7 there exist constants , and such that for any all and all we have:
[TABLE]
Here we can choose , given by Proposition 1 applied to , with provided by Proposition 1 applied to and with as in Lemma 6.
Proof.
Let us fix a and write . By Proposition 1 there exist and such that implying for any positive integer
[TABLE]
For a general positive integer write with . Then, we get
[TABLE]
To complete the proof apply the second inequality of (108) times to obtain
[TABLE]
Now hence thus the claim of the theorem follows. ∎
Proposition 2 takes now the following modified form:
Theorem 4**.**
Under Assumptions 5, 6 and 7 there exist constants and such that for any signed measure , all and we have
[TABLE]
The constants and are the same as in Theorem 3. Alternatively, let be two possibly signed measures on as in Definition 4. Then we have
[TABLE]
Finally, we have the following extension of Proposition 3:
Theorem 5**.**
Under Assumptions 5, 6 and 7 for all there exists a unique probability measure on such that and Denoting the unique invariant probability measure for by we have
Proof.
Let us fix any and write , and Thus is the unique invariant probability measure for the existence of which is ensured by Proposition 3. Now, we show that for any we have . Indeed, write:
[TABLE]
Here the r.h.s. can be bounded from above, using the definition of and the first half of (108), by
[TABLE]
which is finite since It follows that the probability measure defined by
[TABLE]
also satisfies and it is readily seen to be invariant for Since any probability measure invariant for is also invariant for there cannot be measures that are invariant for besides , and thus we have . ∎
The main results of Section III can now be extended, with minor modifications, assuming the above relaxed conditions.
Theorem 6**.**
Assume that the kernels satisfy Assumptions 5, 6 and 7. Let be as given in Theorem 3, let us fix any and write Let be a measurable function such that Let denote the unique invariant probability measure of and let Then, the Poisson equation
[TABLE]
has a unique solution up to additive constants. The particular solution for which can be written as
[TABLE]
where the right hand side is absolutely convergent, and
[TABLE]
for some constant depending only on the constants appearing in Assumptions 5, 6 and 7. It also follows:
[TABLE]
Proof.
Consider the Poisson equation
[TABLE]
where recalling that In view of Theorem 1 it has a unique solution, up to an additive constant. The particular solution with can be written as
[TABLE]
which is well-defined, the r.h.s. is absolutely convergent, and
[TABLE]
implying the inequality
[TABLE]
where is given by
[TABLE]
The solution of (127) is related to that of (123) by noting that
[TABLE]
It follows that
[TABLE]
is a solution of (123). Incidentally, this is also seen from the series expansion (124). To get an upper bound for , write
[TABLE]
Taking into account the upper bound for given in (129), it is seen that it is sufficient to derive upper bounds for for
Now recall that in view of the one-step growth condition and Remark 1, we have , for any probability measure with . By repeated application of this inequality, as in the derivation of (73), we obtain
[TABLE]
Choosing and summing over from [math] to we get
[TABLE]
The right hand side is bounded from above by
[TABLE]
Combining these inequalities with (129) and (133) we get
[TABLE]
implying the upper bound of the form given in (125).
As for uniqueness, assume that there are two solutions , and let . Then, for all , implying . Iterating this times we get and by Theorem 1 we conclude that is a constant function, thus completing the proof. ∎
The extension of Theorem 2, on the Lipschitz continuity of , seems to be straightforward. However, we should point out that we have to assume the Lipschitz continuity of the one-step kernels as given in Assumption 3.
Theorem 7**.**
Assume that the kernels satisfy Assumptions 5, 6 and 7. Let us fix as given in Theorem 3. In addition assume that the family of one-step kernels is Lipschitz continuous is the sense of Assumption 3. Let be a family of measurable functions such that Assumption 4 holds. Let denote the unique invariant probability measure of and let Consider the Poisson equations
[TABLE]
Then, is Lipschitz continuous in :
[TABLE]
and the particular solution
[TABLE]
is well-defined for all , and is Lipschitz continuous in :
[TABLE]
Alternatively, we can write
[TABLE]
Here the constants and depend only on the constants appearing in Assumptions 3, 4, 5, 6, and 7.
For the proof we need a simple extension of Lemma 4:
Lemma 8**.**
Assume that satisfies the uniform one-step growth condition, Assumption 6, and the Lipschitz continuity condition Assumption 3 with some Then for any pair for any signed measure satisfying , and for any
[TABLE]
for all , where depends only on the constants appearing in the conditions of the lemma and on .
Proof.
The proof is obtained by a simple modification of the proof of Lemma 4. We can estimate using a sequence of triangular inequalities to get
[TABLE]
Consider the th term and apply Lemma 7 repeatedly times setting and :
[TABLE]
Note that the conditions of Lemma 7 are satisfied for : obviously and , for due to the repeated application of the one-step growth condition.
Combining the last two inequalities we get:
[TABLE]
Consider the -th term, and apply the Lipschitz continuity of Assumption 3, implying Lemma 3. Applying the latter for the signed measure we get the upper bound
[TABLE]
To estimate , we invoke (73) with replacing :
[TABLE]
By plugging this into (146), we get the upper bound
[TABLE]
The first term in the above sum is bounded from above by The second term can be written as
[TABLE]
Recall that each of the terms of the convolution of the sequences and can be estimated from above by for any where depends only on , and Summarizing the inequalities (145) to (IV), and the arguments that follow, we get the claim. ∎
Proof of Theorem 7.
First, note that implies that
[TABLE]
Applying Theorem 2, for the Poisson equation
[TABLE]
we conclude that is Lipschitz continuous in :
[TABLE]
where is given, according to (81), by
[TABLE]
where is the Lipschitz-constant for the kernels as defined in Assumption 3, and and are chosen as in Theorem 3.
In order to prove the second part of Theorem 7, note that, in view of Theorem 2, the particular solution given by is Lipschitz continuous w.r.t. , and
[TABLE]
where is defined in (102), which now becomes
[TABLE]
It follows that the specific solution of Poisson equation (138):
[TABLE]
is also Lipschitz continuous in Indeed, for
[TABLE]
and the first term on the r.h.s. is bounded from above as
[TABLE]
for all . Applying (IV) with we get the upper bound
[TABLE]
For the second term on the right hand side of (155) first we note that by (130) in the proof of Theorem 6:
[TABLE]
for some constant depending only on the constants appearing in Assumptions 5, 6 and 7. Now we can write
[TABLE]
Applying Lemma 8 we get that the right hand side of (158) is bounded by
[TABLE]
where is defined by (131). Recall that by Assumption 4. Taking into account the representation of given in (154), and the decomposition given in (155), and adding the upper bounds (156) and (159) for , we get the second claim. ∎
Remark 7**.**
A nice corollary of Lemma 8 is the Lipschitz continuity of the probability transition kernel of the sampled process, with the first sample after time [math] taken at time with probability given by the resolvent
[TABLE]
It is easily seen that, using the notations and assumptions of Lemma 8, we have for any
[TABLE]
We conclude this section by giving a simple criterion for the verification of the uniform drift condition for the -frame process, as given in Assumption 5. Motivated by the example of linear stochastic systems, given in Section IV, requiring only individual stability of the matrices , we propose the following condition in terms of the one-step kernels :
Assumption 8** (Individual Drift Conditions).**
There exists a family of measurable functions and universal constants and such that
[TABLE]
for all and . Moreover, there exists a measurable and constants with such that
[TABLE]
for all and .
Lemma 9**.**
Under Assumption 8, for any sufficiently large , the uniform drift condition for and the one-step growth condition, Assumptions 5 and 6, are satisfied with
Proof.
Since , for any :
[TABLE]
Choosing so that the uniform drift condition holds for w.r.t. . In order to prove the uniform one-step growth condition, take in (IV) and set ∎
V Controlled Queues
In this section we return to our prime example given in Section II. We will show that under reasonable additional conditions on a controlled queue Assumptions 1, 3, 7 are satisfied with , with small enough, when is restricted to compact set . Thus the results of the paper, in particular Theorems 5, 6, 7, imply the existence, uniqueness and Lipschitz continuity of the solution of the parameter-dependent Poisson equation
[TABLE]
where the normalizing constant is the expectation of under the (unique) invariant measure, when is restricted to an open set in place of .
The conditions below will be given in terms of the r.v. thus ensuring the generality of our results. Specific conditions in terms of and will be given at the end of the section. To guarantee stability of the system (8) we stipulate:
Assumption 9**.**
We have , for all
This is a standard condition implying stability of the queue for any fixed in a variety of interpretations, see e.g., [13, 3] and [5]. A further standard condition in queuing theory, and also in the area of risk processes [17], is the existence of a finite positive exponential moment of or equivalently that of A uniform version of this condition in terms of is given below:
Assumption 10**.**
We have , for some .
Observe that Assumption 10 is automatically satisfied if Finally, we will need the following continuity condition for :
Assumption 11**.**
The probability distribution function of is weakly continuous in for i.e. is continuous in for all bounded continuous function .
We note in passing that these three assumptions imply that the stability condition, Assumption 9, is satisfied uniformly in for
[TABLE]
Uniform Drift Condition. The validity of the uniform drift condition, given as Assumption 1, with no parameter-dependence, has been established using the Lyapunov function , with small enough, see e.g., Section 16.4 of [14]. (We should note that the use of exponential moments is also a standard tool in the theory of risk processes, see [17]). For the sake of completeness and further reference we restate this result, and provide its proof.
Lemma 10**.**
Let us assume that is an -valued random variable such that and for some we have Then there exist such that for all there exists such that
[TABLE]
Proof.
Since the function is convex in the finite difference quotients are monotone non-increasing for with negative limit:
[TABLE]
Hence there exists a and some such that for all
[TABLE]
It follows that and thus we get
[TABLE]
∎
A minor, but essential technical extension of the above arguments, in order to derive a uniform version of the drift condition, is stated in the lemma below:
Lemma 11**.**
Let be a family of -valued random variables, satisfying Assumptions 9, 10 and 11. Then there exist such that for all there exists such that for all
[TABLE]
Remark 8**.**
Taking into account the proof of Lemma 10 it is clear that all we need to show to prove Lemma 11 is that a uniform version of (169) is valid, i.e. that there exists a and some such that for all and for all
[TABLE]
Let us define the family of functions By Assumption 10 it is readily seen that the random variables are uniformly integrable for , and, therefore, by Assumption 11 it follows that is continuous in The desired claim (172) now follows from the following lemma formulated in the context of convex analysis:
Lemma 12**.**
Let be a family of convex functions in the variable with and with being a compact set, such that
- •
* for all *
- •
for all fixed we have
[TABLE]
- •
for all fixed the function is continuous in
Then there exists such that for we have
[TABLE]
It follows that for
Remark 9**.**
It also readily follows that
[TABLE]
Proof of Lemma 12.
Let us define the function
[TABLE]
for Obviously, is convex and The claim of the lemma can be then restated as saying that there exists such that for we have
[TABLE]
Assume that the claim is not true, and let be a monotone sequence such that we have
[TABLE]
Let be such that
[TABLE]
Due to the compactness of we can assume that has a limit in say Consider now the function and choose a such that
[TABLE]
The continuity of in implies for sufficiently large . On the other hand the convexity of the function and and imply that for we have a contradiction, proving the claim. ∎
Local Minorization for . As for the local minorization condition it will be verified in a form slightly stronger than our Assumption 7, by showing that for any fixed there exists some integer which may depend on such that
[TABLE]
Indeed, the above inequality implies Assumption 7. To see this, take an arbitrary , as in Assumption 7, satisfying Then the set
[TABLE]
is of the form with some . Letting denote the probability measure assigning unit mass to [math] inequality (179) implies with some for all and as postulated by (113).
To prove (179) we will use arguments familiar in queuing theory, but to establish uniform bounds extra care is needed. Consider first a fixed and let be any fixed real number, defining the bounded Let and consider the -step transition probability for some integer .
Lemma 13**.**
There is such that
[TABLE]
Proof.
By Lemma 12, it follows that for sufficiently small we have hence
[TABLE]
which is strictly smaller than for small enough. ∎
Corollary 3**.**
For each there is such that
[TABLE]
Proof.
Indeed, let be as in Lemma 13 and choose so large that . Then, for all and , is bounded from below by
[TABLE]
∎
A nice additional result is the following: let denote the stationary solution of the queue dynamics given by
[TABLE]
Then
[TABLE]
where tends to exponentially fast when tends to . Moreover,
[TABLE]
Lipschitz Continuity of . In order to verify Assumption 3 we will need to strengthen our assumptions on and . We may consider various scenarios, briefly discussed below, both of them implying a common condition on as follows:
Assumption 12**.**
The probability distribution function of has a density function for all denoted by There exist and such that for all and all , and it holds that
[TABLE]
Assumption 12 can be conveniently verified by imposing the following assumption on and , when is assumed to be independent of the probability distribution functions of has a density function for all denoted by and there exist such that for all and all , it holds that Moreover the probability distribution function of has a density function denoted by such that for all it holds that
The first part of the above auxiliary assumption can be conveniently checked by requiring the existence of a density function such that the mapping is measurable, and for each fixed continuously differentiable in , moreover there exist such that for all and all it holds that The proof is readily obtained by the mean-value theorem. Incidentally, it also follows that the law of is continuous in total variation, and, a fortiori, also weakly, implying Assumption 11.
A condition reciprocal to the above is obtained by interchanging the role of and i.e. assuming that does not depend on while does. We note that in this case the requirement that the density function of denoted by satisfies for all implies Assumption 10 with any .
In order to verify Assumption 3, let us consider the transition probabilities . For any Borel set we can write
[TABLE]
State being an atom for is reached with probability
[TABLE]
Lemma 14**.**
Let Assumption 12 hold, and let , with Then for all with and all we have with some
[TABLE]
Proof.
We can write
[TABLE]
First, for the regular part (first term on the r.h.s.) we have
[TABLE]
[TABLE]
with some constant On the other hand, for the atomic component (second term on the r.h.s.) we get
[TABLE]
Inequalities (191) and (V) imply the claim of the lemma. ∎
VI Discussion
Recursive identification of stochastic systems is a central problem of mathematical system theory, instrumental for a variety of fields such as adaptive control, adaptive signal processing and adaptive input design. A milestone in the development of a mathematically rigorous theory was the book [2], considering the abstract problem of solving an algebraic equation defined via a parameter-dependent Markov process, with an explicit link to stochastic approximation (SA). This paper is a significant addition to the theory developed in [2], inasmuch an alternative methodology is presented for the analysis of Poisson equations, associated with the Markov process in consideration. This apparently off-beat technical problem is in fact the fundamental tool in the ODE analysis of [2], taking up more than half of the efforts in proving its basic convergence results.
Our approach utilizes a novel and elegant stability theory for Markov chains, developed by Hairer and Mattingly [7], which allowed us to establish simple, transparent conditions to be imposed on the probability transition kernel that ensure the existence, uniqueness and Lipschitz continuity of the solution of the Poisson equation for a parametrized family of Markov chains. Possible relaxations of our core assumptions were also discussed. The power of the suggested framework was demonstrated via its applicability to a controlled queuing system. To complete the loop, the technology presented here will be applied for the ODE analysis recursive estimators along the lines of [2] in a forthcoming paper.
Now, we briefly discuss a limitation of our results, inherent due to the approach of [7], together with a potential ramification, further context w.r.t. previous works and applications.
Recalling that the conditions given in Section IV, formulated in terms of were motivated by properties of linear stochastic systems, defined under (104), it is somewhat surprising that the results of the paper are not directly applicable for this class of models. The reason is that Assumption 3, requiring the Lipschitz continuity of the one-step transition probability kernel w.r.t. an extended total variation distance, will not be satisfied in general, since the probability measures of will be concentrated to different proper sub-spaces, and hence will be singular. An appropriate tool for circumventing this difficulty could be to resort to Wasserstein distance, and revisit the current paper relying on the methodology developed in [8].
An alternative set of conditions under which the problems of the paper may be worth studying is provided by the theory developed already in the first edition of [14], extended in later works, such as [9]. In this latter highly technical paper large deviation principles are derived for geometrically ergodic Markov chains. However, it is not clear how these techniques and results could be extended in a parameter-dependent context. In particular it is not clear, what the analogue of the pair of function spaces given in Assumption 4 and would be, so that the Lipschitz continuity of acting on the subspace defined by is implied.
Regarding applications, the analysis of large telecommunication systems requires the study of queueing networks with several servers and customers getting consecutive services, allowing various control actions such as call admission control, [10]. The extension of the results of Section V to queueing networks is an attractive, but challenging problem.
Finally, the recent intense interest in online machine learning, particularly in reinforcement learning where SA technologies are often adapted and for which the Markovian setup is the standard choice, is an add-on justification of the paper.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] K. B. Athreya and P. Ney. A new approach to the limit theory of recurrent Markov chains. Transactions of the American Mathematical Society , 245:493–501, 1978.
- 2[2] A. Benveniste, M. Métivier, and P. Priouret. Adaptive Algorithms and Stochastic Approximations . Springer, 2nd edition, 1990.
- 3[3] A. A. Borovkov. Egodicity and Stability of Stochastic Processes . Wiley & Sons, New York, 1998.
- 4[4] N. Brosse, A. Durmus, S. Meyn, É. Moulines, and A. Radhakrishnan. Diffusion approximations and control variates for MCMC. ar Xiv:1808.01665 , 2018.
- 5[5] P. Diaconis and D. Freedman. Iterated random functions. SIAM Review , 41(1):45–76, 1999.
- 6[6] L. Gerencsér. Rate of convergence of recursive estimators. SIAM Journal on Control and Optimization , 30(5):1200–1227, 1992.
- 7[7] M. Hairer and J. C. Mattingly. Yet another look at Harris’ ergodic theorem for Markov chains. In Seminar on Stochastic Analysis, Random Fields and Applications VI , pages 109–117. Springer, 2011.
- 8[8] M. Hairer, J. C. Mattingly, and M. Scheutzow. Asymptotic coupling and a general form of Harris’ theorem with applications to stochastic delay equations. Probability Theory and Related Fields , 149:223–259, 2011.
