Stochastic Approximation in a Markovian Framework Revisited: Lipschitz Continuity of the Poisson Equation

Algo Car\`e; Bal\'azs Csan\'ad Cs\'aji; Bal\'azs Gerencs\'er; L\'aszl\'o Gerencs\'er; Mikl\'os R\'asonyi

arXiv:1906.09464·math.PR·October 2, 2025

Stochastic Approximation in a Markovian Framework Revisited: Lipschitz Continuity of the Poisson Equation

Algo Car\`e, Bal\'azs Csan\'ad Cs\'aji, Bal\'azs Gerencs\'er, L\'aszl\'o Gerencs\'er, Mikl\'os R\'asonyi

PDF

Open Access

TL;DR

This paper revisits a key technical aspect of stochastic approximation in Markovian settings, providing simple conditions to ensure the Lipschitz continuity of solutions to the Poisson equation, which is crucial for analyzing algorithms in various applications.

Contribution

It introduces straightforward conditions to verify the existence, uniqueness, and Lipschitz continuity of solutions to the parameter-dependent Poisson equation in Markovian frameworks.

Findings

01

Conditions verified for a class of queuing systems with open-loop control.

02

Established Lipschitz continuity of the Poisson equation solutions.

03

Simplified technical verification in stochastic approximation analysis.

Abstract

In this paper we revisit a fundamental technical issue within the theory of stochastic approximation (SA) in a Markovian framework, first proposed in the book by Djereveckii and Fradkov (1981), and further developed in much detail in the book by Benveniste, M{\'e}tivier, and Priouret (1990). This theory is instrumental in many application areas such as the statistical analysis of Hidden Markov Models arising in telecommunication, quantized linear stochastic systems, and more recently in active learning and reinforcement learning. The problem at hand is the verification of the existence, uniqueness and Lipschitz-continuity of the solution of a parameter-dependent Poisson equation, in an appropriate weighted sup-norm, associated with a collection of Markov chains on general state spaces. Verification of the above facts is vital in the analysis of SA processes presented in (Benveniste et…

Equations477

\mathbb{E}\hskip 0.28453pt{{\big{[}}}\hskip 0.28453ptH(X_{n}(\theta),\theta)\hskip 0.28453pt{{\big{]}}}\hskip 0.56905pt=\,0,

\mathbb{E}\hskip 0.28453pt{{\big{[}}}\hskip 0.28453ptH(X_{n}(\theta),\theta)\hskip 0.28453pt{{\big{]}}}\hskip 0.56905pt=\,0,

(I - P_{θ}^{*}) u_{θ} (x) = f_{θ} (x) - h_{θ},

(I - P_{θ}^{*}) u_{θ} (x) = f_{θ} (x) - h_{θ},

\sum_{n=1}^{N}\left(H(X_{n}(\theta),\theta)-\mathbb{E}_{\mu_{\theta}^{\ast}}{{\big{[}}}H(X_{n}(\theta),\theta){{\big{]}}}\right),

\sum_{n=1}^{N}\left(H(X_{n}(\theta),\theta)-\mathbb{E}_{\mu_{\theta}^{\ast}}{{\big{[}}}H(X_{n}(\theta),\theta){{\big{]}}}\right),

(P_{θ} μ) (A)

(P_{θ} μ) (A)

(P_{θ}^{*} φ) (x)

(P_{θ}^{*} V) (x) \leq γ V (x) + K,

(P_{θ}^{*} V) (x) \leq γ V (x) + K,

P_{θ} μ (V) \leq γ μ (V) + K μ (X) .

P_{θ} μ (V) \leq γ μ (V) + K μ (X) .

∣ P_{θ} η ∣ (V) \leq γ ∣ η ∣ (V) + K ∣ η ∣ (X),

∣ P_{θ} η ∣ (V) \leq γ ∣ η ∣ (V) + K ∣ η ∣ (X),

W_{n + 1} = (W_{n} + U_{n + 1})^{+} .

W_{n + 1} = (W_{n} + U_{n + 1})^{+} .

W_{θ, n + 1} = (W_{θ, n} + U_{θ, n + 1})^{+} .

W_{θ, n + 1} = (W_{θ, n} + U_{θ, n + 1})^{+} .

E [U_{θ, 1}] < 0,

E [U_{θ, 1}] < 0,

P_{θ} (x, A) \geq \overset{α}{ˉ} \overset{μ}{ˉ} (A) .

P_{θ} (x, A) \geq \overset{α}{ˉ} \overset{μ}{ˉ} (A) .

\int_{X} V (x) μ_{θ}^{*} (d x) \leq \frac{K}{1 - γ} .

\int_{X} V (x) μ_{θ}^{*} (d x) \leq \frac{K}{1 - γ} .

(P_{θ}^{*} V^{'}) (x)

(P_{θ}^{*} V^{'}) (x)

\leq γ (V (x) + c) + K + (1 - γ) c,

R^{'} > 2 K^{'} / (1 - γ) \geq 2 K / (1 - γ) + 2 c = R_{0} + 2 c =: R_{0}^{'} .

R^{'} > 2 K^{'} / (1 - γ) \geq 2 K / (1 - γ) + 2 c = R_{0} + 2 c =: R_{0}^{'} .

{x \in X : V^{'} (x) \leq R^{'}} = {x \in X : V (x) \leq R^{'} - c},

{x \in X : V^{'} (x) \leq R^{'}} = {x \in X : V (x) \leq R^{'} - c},

ρ_{β} (μ_{1}, μ_{2}) := \int_{X} (1 + β V (x)) ∣ μ_{1} - μ_{2} ∣ (d x),

ρ_{β} (μ_{1}, μ_{2}) := \int_{X} (1 + β V (x)) ∣ μ_{1} - μ_{2} ∣ (d x),

ρ_{β} (η) := \int_{X} (1 + β V (x)) ∣ η ∣ (d x),

ρ_{β} (η) := \int_{X} (1 + β V (x)) ∣ η ∣ (d x),

∥ φ ∥_{β} := x sup \frac{∣ φ ( x ) ∣}{1 + β V ( x )} .

∥ φ ∥_{β} := x sup \frac{∣ φ ( x ) ∣}{1 + β V ( x )} .

ρ_{β} (μ_{1}, μ_{2}) = φ : ∥ φ ∥_{β} \leq 1 sup \int_{X} φ (x) (μ_{1} - μ_{2}) (d x) .

ρ_{β} (μ_{1}, μ_{2}) = φ : ∥ φ ∥_{β} \leq 1 sup \int_{X} φ (x) (μ_{1} - μ_{2}) (d x) .

ρ_{β} (η) := φ : ∥ φ ∥_{β} \leq 1 sup \int_{X} φ (x) η (d x) .

ρ_{β} (η) := φ : ∥ φ ∥_{β} \leq 1 sup \int_{X} φ (x) η (d x) .

d_{\beta}(x,y)=\begin{cases}2+\beta V(x)+\beta V(y)&\mbox{if $x\neq y$},\\ 0&\mbox{if $x=y$}.\\ \end{cases}

d_{\beta}(x,y)=\begin{cases}2+\beta V(x)+\beta V(y)&\mbox{if $x\neq y$},\\ 0&\mbox{if $x=y$}.\\ \end{cases}

∣ ∣ ∣ φ ∣ ∣ ∣_{β} := x \neq = y sup \frac{∣ φ ( x ) - φ ( y ) ∣}{d _{β} ( x , y )} .

∣ ∣ ∣ φ ∣ ∣ ∣_{β} := x \neq = y sup \frac{∣ φ ( x ) - φ ( y ) ∣}{d _{β} ( x , y )} .

∣ ∣ ∣ φ ∣ ∣ ∣_{β}

∣ ∣ ∣ φ ∣ ∣ ∣_{β}

\leq x \neq = y sup max {\frac{∣ φ ( x ) ∣}{1 + β V ( x )}, \frac{∣ φ ( y ) ∣}{1 + β V ( y )}}

= ∥ φ ∥_{β} .

\int_{X} (1 + β V (x)) ∣ η ∣ (d x) < \infty, and η (X) = 0,

\int_{X} (1 + β V (x)) ∣ η ∣ (d x) < \infty, and η (X) = 0,

φ \int_{X} φ (x) η (d x)

φ \int_{X} φ (x) η (d x)

σ_{β} (η) := φ : ∣ ∣ ∣ φ ∣ ∣ ∣_{β} \leq 1 sup \int_{X} φ (x) η (d x) .

σ_{β} (η) := φ : ∣ ∣ ∣ φ ∣ ∣ ∣_{β} \leq 1 sup \int_{X} φ (x) η (d x) .

σ_{β} (μ_{1}, μ_{2}) := φ : ∣ ∣ ∣ φ ∣ ∣ ∣_{β} \leq 1 sup \int_{X} φ (x) (μ_{1} - μ_{2}) (d x) .

σ_{β} (μ_{1}, μ_{2}) := φ : ∣ ∣ ∣ φ ∣ ∣ ∣_{β} \leq 1 sup \int_{X} φ (x) (μ_{1} - μ_{2}) (d x) .

σ_{β} (μ_{1}, μ_{2}) = ρ_{β} (μ_{1}, μ_{2}) .

σ_{β} (μ_{1}, μ_{2}) = ρ_{β} (μ_{1}, μ_{2}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Queuing Theory Analysis

Full text

Poisson Equations, Lipschitz Continuity and Controlled Queues

Algo Carè1

Balázs Csanád Csáji2,4

Balázs Gerencsér3,4

László Gerencsér2

Miklós Rásonyi3,4 ∗A. Carè and B. Cs. Csáji were (partially) supported by the European Commission through the H2020 project Centre of Excellence in Production Informatics and Control (EPIC, 739592). B. Cs. Csáji and L. Gerencsér were supported by the European Union within the framework of the National Laboratory for Autonomous Systems (RRF-2.3.1-21-2022-00002). M. Rásonyi and B. Gerencsér were supported by NRDI (National Research, Development and Innovation Office) grant KKP 137490. B. Gerencsér was also supported by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences.1A. Carè is with Dipartimento di Ingegneria dell’Informazione, Università di Brescia, 25123, Brescia, Italy, [email protected]2L. Gerencsér and B. Cs. Csáji are with the Institute for Computer Science and Control (SZTAKI), Eötvös Loránd Research Network (ELKH), Kende utca 13-17, H-1111, Budapest, Hungary, gerencser. [email protected], [email protected]3B. Gerencsér and M. Rásonyi are with the Alfréd Rényi Institute of Mathematics, Eötvös Loránd Research Network (ELKH), Reáltanoda u. 13-15., H-1053, Budapest, Hungary, [email protected], [email protected]4B. Cs. Csáji, B. Gerencsér and M. Rásonyi are also with the Institute of Mathematics, Eötvös Loránd University (ELTE), Budapest, Hungary

Abstract

The objective of the paper is to revisit a key mathematical technology within the theory of stochastic approximation in a Markovian framework, elaborated in detail by Benveniste, Métivier, and Priouret (1990): the existence, uniqueness and Lipschitz continuity of the solutions of a parameter-dependent Poisson equation associated with a collection of Markov chains on general state spaces. The setup and the methodology of our investigation is based on an elegant stability theory for Markov chains, developed by Hairer and Mattingly (2011). The paper provides a transparent analysis of parameter-dependent Poisson equations with convenient conditions. The validity of the proposed conditions is verified for a class of controlled queues.

I Introduction

A beautiful area of systems and control theory is recursive identification, and stochastic adaptive control of stochastic systems. In an abstract mathematical framework [2] [12] the key problem is to solve a non-linear algebraic equation

[TABLE]

where $\theta\in\mathbb{R}^{k}$ is an unknown, vector-valued parameter of a physical plant or controller, $(X_{n}(\theta)),\leavevmode\nobreak\ -\infty<n<+\infty$ is a strictly stationary stochastic process, representing a physical signal affected by $\theta,$ and $H(X,\theta)$ is a computable function. The same mathematical framework is applied in other fields such as adaptive signal processing and machine learning.

Our objective is to find the root of (1), denoted by $\theta^{\ast},$ via a recursive algorithm based on computable approximations of $H(X_{n}(\theta),\theta).$ In the case when $H(X_{n}(\theta),\theta)=h(\theta)+e_{n},$ where $(e_{n})$ is an i.i.d. process, or a martingale difference sequence, we get a classical stochastic approximation process.

An early version of the above problem is presented in the celebrated paper by Ljung [11], in which $(X_{n}(\theta))$ was assumed to be defined via a linear stochastic system driven by a weakly dependent process.

A renewed interest in recursive estimation in a Markovian framework was sparked by the excellent book of Benveniste, Métivier and Priouret [2] elaborating an extensive mathematical technology for the analysis of these processes. A central tool in their analysis is a complex set of results concerning the parameter-dependent Poisson equation. This is carried out by a specific stability theory for a class of Markov processes, which is off the track of usual methodologies, e.g., Athreya and Ney [1], Nummelin [15], Meyn and Tweedie [14].

The enormous practical value of the estimation problem in a Markovian framework motivates our interest to revisit the theory of [2], and see if their analysis can be simplified or even extended in the light of recent progress in the theory of Markov processes. blueThe starting point of our investigation is a relatively new, elegant stability theory for Markov processes developed by Hairer and Mattingly [7].

The focus of the present paper is the study of the parameter-dependent Poisson equation formulated as

[TABLE]

where $P_{\theta}$ is the probability transition kernel of the Markov process $(X_{n}(\theta)),$ with $P_{\theta}^{\ast}u_{\theta}(\cdot)$ denoting the action of $P_{\theta}$ on the unknown function $u_{\theta}(\cdot),$ and $f_{\theta}(\cdot)$ is an a priori given function defined on the state-space of the process, finally $h_{\theta}$ denotes the mean value of $f_{\theta}(\cdot)$ under the assumed unique invariant measure, say $\mu_{\theta}^{\ast}$ , corresponding to $P_{\theta}.$

The Poisson equation is a simple and effective tool to study additive functionals on Markov processes of the form

[TABLE]

via martingale techniques. Proving the Lipschitz continuity of $u_{\theta}(x)$ w.r.t. $\theta$ , and providing useful upper bounds for the Lipschitz constants are vital technical tools for an ODE analysis proposed in [2, Part II, Chapter 2]. In fact, the analysis of the Poisson equation takes up more than half of the efforts in proving the basic convergence results in [2], and the verification of their conditions, in particular a kind of Lipschitz continuity of the probability transition kernels formulated in terms of second order differences, see Theorem 6, condition (iv) on page 262 of [2], is far from being trivial.

The objective of our project is to revisit the relevant mathematical technologies and outline a transparent and flexible analysis within the setup of [7]. The present paper is devoted to the first half of this project, the analysis of a parameter-dependent Poisson equation. The application of our results for stochastic approximation within a Markovian framework is the subject of a forthcoming paper, in which a combination of the ODE analysis developed in [2] and [6] is to be extended using the results of the current paper. In the end we get the expected rate of convergence for the moments of the estimation error under a convenient set of conditions.

The significance of the topic of the paper is reinforced by the current intense interest in the minimization of functions computed via MCMC [4]. To complement the above historical perspective we should note that the problem goes back to [16], providing results for finite state Markov chains. The extension to more general state spaces is far from trivial, posing the challenge to choose an appropriate distance of measures.

The structure of the paper is as follows: in Section II we provide an introduction to the stability theory for Markov chains developed in [7]. The main results of the paper are stated in Section III, culminating in Theorem 2, proving the Lipschitz continuity of the solutions of a parameter-dependent Poisson equation. These results are extended in Section IV, in particular, the uniform drift condition, stated as Assumption 1, is significantly relaxed. The applicability of our results for controlled queuing systems is presented in Section V. The paper is concluded with a brief discussion.

Given the highly technical nature of the paper we think that the present structure, introducing the relevant concepts, theorems and their proofs incrementally, enhances clarity.

II A Brief Summary of

a New Stability Theory for Markov Chains

Let $(\mathbb{X},\mathcal{A})$ be a measurable space and $\Theta\subseteq\mathbb{R}^{k}$ be a domain (i.e., a connected open set). Consider a class of Markov transition kernels $P_{\theta}(x,A)$ , that is for each $\theta\in\Theta$ , $x\in{{\mathbb{X}}}$ , $P_{\theta}(x,\cdot)$ is a probability measure over $\mathbb{X}$ , and for each $A\in\mathcal{A}$ , $P_{\cdot}(\cdot,A)$ is $(x,\theta)$ -measurable. Let $(X_{n}(\theta))$ , $n\geq 0$ , be a Markov chain with transition kernel $P_{\theta}$ . For any probability measure $\mu$ and measurable $\varphi$ : ${\mathbb{X}}\rightarrow\mathbb{R}$ define

[TABLE]

assuming the integral exists. The next condition is motivated by [7], stated there for single Markov-chains.

Assumption 1 (Uniform Drift Condition for $P_{\theta}$ ).

There exists a measurable function $V:\mathbb{X}\rightarrow[0,\infty)$ and constants $\gamma\in(0,1)$ and $K\geq 0$ such that

[TABLE]

for all $x\in{\mathbb{X}}$ and $\theta\in\Theta$ . $V(x)$ is often called a Lyapunov function in the literature. Note that $V(x)$ is not $\theta$ -dependent.

Remark 1.

Inequality (4) without requiring $\gamma<1$ will be called uniform one-step growth condition. This condition implies that for any measure $\mu$ such that $\mu(V):=\int_{\mathbb{X}}V(x)\mu(\mathrm{d}x)<\infty$ and $\mu(\mathbb{X})<\infty,$ we have for all $\theta\in\Theta:$

[TABLE]

Indeed, integrating (4) with respect to $\mu$ we get (5). Moreover, for any signed measure $\eta$ with $|\eta|(1+\beta V)<\infty$ , we have

[TABLE]

for all $\theta\in\Theta$ due to the inequality $|P_{\theta}\eta|\leq P_{\theta}|\eta|$ .

Example. The prime targeted area for application is controlled queuing system, where both the arrival and service processes may be subject to control, such as Call Admission Control, see e.g., Chapter 11 of [10]. Possible objective of control actions, such as call rejection, may be to optimize a criterion such as the total waiting time. In order to demonstrate the applicability of the results of the present paper, we restrict attention to a single server, for which both the arrival and service process may be subject to control.

Let us first describe the dynamics of a queue without control. Let the arrival process be identified by a simple point process ${\tau_{n}},\,n\geq 0$ with $\tau_{0}=0,$ and let $T_{n+1}$ be the (finite) time elapsed between the arrivals of customers $n$ and $n+1,$ i.e. $T_{n+1}=\tau_{n+1}-\tau_{n}.$ Let $S_{n}$ be the service time of customer $n.$ It is assumed that the sequences $(T_{n})$ and $(S_{n})$ , $n\in\mathbb{N}$ , are i.i.d. sequences of $\mathbb{R}_{+}$ -valued random variables, respectively, independent of each other. We define $x^{+}:=\max\{x,0\}$ , $x\in\mathbb{R}$ . The waiting time of the $n$ -th customer will be denoted by $W_{n}.$ It is readily seen that it satisfies the recursion, with $W_{0}=0$ as initial value, $n\in\mathbb{N}$ and with $U_{n+1}=S_{n}-T_{n+1}$ :

[TABLE]

In the case of controlled queues both the service time and the arrival time may depend on a control parameter $\theta$ . Let $\Theta\subset\mathbb{R}^{k}$ be a connected, open set as above, and let $D\subset\mathbb{R}^{k}$ be a compact set such that $D\subset\Theta.$ The choice of the control parameter $\theta$ may determine the law of the service times and that of the arrival times via actions such as call rejection. Then the dynamics of the queue can be described, with $W_{\theta,0}=0,$ $n\in\mathbb{N}$ and $U_{\theta,n+1}:=S_{\theta,n}-T_{\theta,n+1},$ by

[TABLE]

If the initial condition is $W_{\theta,0}=x$ , then the waiting time at $n$ will be denoted by $W_{\theta,n}(x).$ To guarantee stability of the queue we have to assume

[TABLE]

for all $\theta\in D.$ This is a standard condition implying stability of the queue for any fixed $\theta$ in a variety of interpretations, see e.g., [13, 3] and [5]

The validity of the drift condition, given as Assumption 1, with no parameter-dependence, has been established under appropriate technical conditions, using the Lyapunov function $V(x):=e^{\chi x}$ , $x\in\mathbb{R}_{+},$ with $\chi>0$ small enough, in Section 16.4 of [14]. A uniform version of this result will be established in Section V.

The next condition is a natural extension of the corresponding assumption of [7] for a parametric family of Markov chains, which itself is a modification of a standard condition in the stability theory of Markov chains [14].

Assumption 2 (Local Minorization).

Let $R>2K/(1\!-\!\gamma)$ , where $\gamma$ and $K$ are the constants from Assumption 1, and set ${\cal C}=\{x\in{\mathbb{X}}:V(x)\leq R\}$ . There exist a probability measure $\bar{\mu}$ on $\mathbb{X}$ and a constant $\bar{\alpha}\in(0,1)$ such that, for all $\theta\in\Theta$ , all $x\in{\cal C}$ , and all measurable $A$ ,

[TABLE]

Remark 2 (Interpretation of $R$ ).

If there exists an invariant measure $\mu^{\ast}_{\theta}$ such that $\int_{\mathbb{X}}V(x)\mu^{\ast}_{\theta}(\mathrm{d}x)<\infty,$ then integrating both sides of inequality (4), we get

[TABLE]

Thus, $R$ in Assumption 2 exceeds twice the mean of $V$ w.r.t. any of the invariant measures.

Remark 3 (Constant Shifts of $V$ ).

We can and will assume that $\inf_{x}V(x)=0$ without loss of generality in view of the following reasoning. If a function $V$ such that $\inf_{x}V(x)\neq 0$ satisfies Assumption 1, then for $V^{\prime}:=V+c,$ where $c$ is a constant with $c\geq-\inf_{x}V(x)$ , it holds:

[TABLE]

hence, $V^{\prime}$ also satisfies Assumption 1 with the same $\gamma$ and with $K^{\prime}:=(K+(1-\gamma)c)\vee 0$ replacing constant $K$ .

To assess the effect of replacing $V$ by $V^{\prime}$ on Assumption 2 note that the condition $R>2K/(1-\gamma)=:R_{0}$ becomes

[TABLE]

On the other hand, the set $\mathcal{C}$ on which the local minimization is required can be written as

[TABLE]

which is exactly $\cal C$ if $R^{\prime}-c=R$ , that is $R^{\prime}=R+c$ . It remains to show that this choice of $R^{\prime}$ satisfies $R^{\prime}>R^{\prime}_{0}.$ In other words $R+c>R_{0}+2c$ , which is trivially satisfied if $c<0$ , in particular with $c:=-\inf_{x}V(x)$ .

We now introduce a weighted total variation distance between two probability measures $\mu_{1},\mu_{2}$ , where the weighting is in the form $1+\beta V(\cdot)$ , where $\beta>0,$ for which a fine-tuned choice will be needed for the results of [7] to hold.

Definition 1.

Let $\mu_{1}$ and $\mu_{2}$ be two probability measures on $\mathbb{X}$ . Then, define the weighted total variation distance as

[TABLE]

where $|\mu_{1}-\mu_{2}|$ is the total variation measure of $(\mu_{1}-\mu_{2})$ .

The definition naturally extends to any pair of possibly signed measures $\mu_{1},\mu_{2}$ such that $\int_{\mathbb{X}}(1+\beta V(x))\mu_{i}(\mathrm{d}x)<\infty$ for $i=1,2,$ assuming that $\mu_{1}({\mathbb{X}})=\mu_{2}({\mathbb{X}}).$ Writing $\eta=\mu_{1}-\mu_{2}$ we can define the weighted total variation norm (as a single variable function)

[TABLE]

An equivalent definition of $\rho_{\beta}$ can be given by introducing the following norm in the space of $\mathbb{R}$ -valued functions on $\mathbb{X}$ :

Definition 2.

For any measurable function $\varphi$ : ${\mathbb{X}}\rightarrow\mathbb{R}$ , set

[TABLE]

The linear space of functions such that $\|\varphi\|_{\beta}<\infty$ will be denoted by ${\cal L}_{V}$ . Note that ${\cal L}_{V}$ is neither affected by constant shifts of $V$ , nor the choice of $\beta$ ; morover, ${\cal L}_{V}$ with the norm $\|\cdot\|_{\beta}$ becomes a Banach space for any $\beta>0.$ An equivalent definition of $\rho_{\beta}$ is:

[TABLE]

Equivalently, we can write with $\eta=\mu_{1}-\mu_{2},\eta(\mathbb{X})=0$

[TABLE]

We will also need a concept mimicking weighted Lipschitz continuity of a function $\varphi\in{\cal L}_{V}.$ Since in our setup $\mathbb{X}$ is not a metric space, first of all we introduce a metric as follows. Denoting by $\delta_{x}$ the Dirac measure at $x$ , note that, for $x\neq y$ , it holds that $\rho_{\beta}(\delta_{x},\delta_{y})=2+\beta V(x)+\beta V(y)$ . This leads to the definition of the following metric on $\mathbb{X}$ :

[TABLE]

This may seem to be an unusual metric, assigning a distance at least $2$ between any pair of distinct points, but it turns out to be quite useful. Having a metric on $\mathbb{X}$ , we will introduce a a measure of oscillation for functions $\varphi:{\mathbb{X}}\rightarrow\mathbb{R}$ as follows:

Definition 3.

For any measurable function $\varphi$ : ${\mathbb{X}}\rightarrow\mathbb{R}$ , set

[TABLE]

It is easily seen that ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\varphi\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}$ is finite on ${\cal L}_{V},$ and in fact we have the following inequality: ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\varphi\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}\leq\|\varphi\|_{\beta}.$ Indeed

[TABLE]

Since ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\varphi\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}$ is invariant w.r.t. translation by any constant $c\in\mathbb{R}$ we also get ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\varphi\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}\leq\|\varphi+c\|_{\beta}.$ Surprisingly, the minimum of these upper bounds reproduces ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\varphi\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}$ as stated in the following lemma given in [7] in a slightly weaker form with “ $\inf$ ” replacing “ $\min$ ”. However, the proof in [7] explicitly confirms the stronger statement below:

Lemma 1.

${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\varphi\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}=\min_{c\in\mathbb{R}}\|\varphi+c\|_{\beta}$ .

It is readily seen that ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\cdot\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}$ is a semi-norm on ${\cal L}_{V}$ and ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\varphi\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}=0$ if and only if $\varphi$ is a constant function. Letting ${\mathbb{R}}_{X}$ denote the linear vector-space of constant functions on $\mathbb{X}$ it follows that ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\cdot\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}$ is a norm on the linear factor-space, ${\cal L}_{V,0}:={\cal L}_{V}/{\mathbb{R}}_{X}.$ It is also easily seen that ${\cal L}_{V,0}$ becomes a Banach space with the norm ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\cdot\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}.$ In what follows, ${\cal L}_{V,0}$ will denote the latter Banach space.

A useful linear subspace of the dual space ${\cal L}_{V,0}^{*}$ is obtained by considering the linear space of signed measures $\eta$ such that

[TABLE]

which will be denoted by ${\cal M}_{V}^{0}.$ It is easily seen that

[TABLE]

is a continuous linear functional the dual norm of which is

[TABLE]

This observation motivates the following definition:

Definition 4.

Let $\mu_{1},\mu_{2}$ be two possibly signed measures on $\mathbb{X}$ such that $\int_{\mathbb{X}}(1+\beta V(x))|\mu_{i}|(\mathrm{d}x)<\infty$ for $i=1,2,$ moreover we have $\mu_{1}(\mathbb{X})=\mu_{2}(\mathbb{X}).$ Then, we define the distance

[TABLE]

A simple corollary of Lemma 1 is the following:

Corollary 1.

Let $\mu_{1},\mu_{2}$ be two possibly signed measures on $\mathbb{X}$ as in Definition 4. Then

[TABLE]

Thus we have the following equivalent expressions for $\sigma_{\beta}$ :

[TABLE]

Proof.

Indeed, $\{\varphi:\|\varphi\|_{\beta}\leq 1\}\subseteq\{\varphi:{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\varphi\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}\leq 1\}$ we immediately get $\rho_{\beta}(\mu_{1},\mu_{2})\leq\sigma_{\beta}(\mu_{1},\mu_{2}).$ On the other hand, take $\varepsilon>0$ and let $\varphi$ be such that ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\varphi\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}\leq 1$ and

[TABLE]

By Lemma 1 there exists a constant $c$ such that ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\varphi\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}=\|\varphi+c\|_{\beta}$ . Thus, $\|\varphi+c\|_{\beta}\leq 1$ , therefore

[TABLE]

Since $\varepsilon$ is arbitrary, we get that $\rho_{\beta}(\mu_{1},\mu_{2})\geq\sigma_{\beta}(\mu_{1},\mu_{2}).$ Combining with the opposite inequality, we get the claim. ∎

Remark 4.

A useful reformulation of the above result is that for any signed measure $\eta\in\mathcal{M}_{V}^{0}$ , we have

[TABLE]

A fundamental result of [7, Theorem 3.1] is as follows:

Proposition 1.

Under Assumptions 1 and 2, there exists $\beta>0$ and $\alpha\in(0,1)$ such that for all $\theta$ and measurable $\varphi$ ,

[TABLE]

The pairs $(\beta,\alpha)$ can be chosen as follows: take $\alpha_{0}\in(0,\bar{\alpha})$ and $\gamma_{0}\in(\gamma+2K/R,1),$ and then set

[TABLE]

Remark 5.

Although there is a freedom in choosing $\alpha_{0}$ and $\gamma_{0}$ the resulting contraction coefficient $\alpha$ is bounded from below: it holds that $\alpha>\gamma.$ In other words, the contraction coefficient $\alpha$ ensured by Proposition 1 is strictly larger than the contraction coefficient $\gamma$ in the drift condition, cf. Assumption 1. In fact, we have, using $1>\gamma_{0}$ ,

[TABLE]

Since $\gamma_{0}>\gamma$ by construction, the statement follows.

Proposition 1 can be restated as saying that $P_{\theta}^{\ast}$ is a contraction on the Banach space ${\cal L}_{V,0}.$ But then its adjoint operator $P_{\theta}$ , having the same norm, is also a contraction. Thus we immediately get the following result, stated essentially in [7, Theorem 1.3]:

Proposition 2.

Under the assumptions of Proposition 1 there exist $\beta>0$ and $\alpha\in(0,1)$ , such that for all $\theta$ , and any signed measure $\eta\in{\cal M}_{V}^{0}$ we have

[TABLE]

Alternatively, let $\mu_{1},\mu_{2}$ be two possibly signed measures on $\mathbb{X}$ as in Definition 4. Then, we have

[TABLE]

In what follows, $\beta$ and $\alpha$ are chosen as indicated in Proposition 1. Using standard arguments one can easily show the following proposition, also stated in [7] as Theorem 3.2:

Proposition 3.

Under Assumptions 1 and 2 for all $\theta$ there is a unique probability measure $\mu^{\ast}_{\theta}$ on $\mathbb{X}$ such that $\int_{\mathbb{X}}V\mathrm{d}\mu^{\ast}_{\theta}:=\int_{\mathbb{X}}V(x)\,\mu^{\ast}_{\theta}(\mathrm{d}x)<\infty$ and $P_{\theta}\mu^{\ast}_{\theta}=\mu^{\ast}_{\theta}.$

Similar results to those of Propositions 2 and 3 are stated in [14, Theorem 14.0.1] under slightly different conditions. In particular, the special choice of the parameter $\beta$ in the weighting function $1+\beta V$ is not part of the conditions in [14] at the price that the contraction of the one-step kernel $P_{\theta}$ is not stated. In addition, in [14] it is a priori assumed that the Markov-chain is $\psi$ -irreducible and aperiodic, while in [7] these conditions are circumvented by assuming that the minorization condition holds on a fairly large set, defined in terms of a sublevel set of $V,$ see Assumption 2. The analysis of the causal connection between these two sets of conditions is certainly of interest for future research.

III Lipschitz Continuity w.r.t. $\theta$ of the

Solution of a $\theta$ -Dependent Poisson Equation

In this section we shall consider the Poisson equations

[TABLE]

for $\theta\in\Theta$ , where $f_{\theta}:\mathbb{X}\to\mathbb{R}$ , $h_{\theta}=\mu^{\ast}_{\theta}(f_{\theta})$ are the input data, and $u_{\theta}:\mathbb{X}\to\mathbb{R}$ is a solution. First, we prove the existence and the uniqueness (up to an additive constant) of the solution for a fixed $\theta$ , adapting standard arguments, then we formulate smoothness conditions on the kernel $P_{\theta}^{\ast}$ , and the right hand side, $f_{\theta}$ . Using these conditions we prove Lipschitz continuity w.r.t. $\theta$ in the norm $\|.\|_{\beta}$ of the particular solution $u_{\theta}$ for which $\mu^{\ast}_{\theta}(u_{\theta})=0$ . For a start let $\theta\in\Theta$ be fixed.

Theorem 1.

Let Assumptions 1 and 2 hold. Let $f$ be a measurable function ${\mathbb{X}}\rightarrow\mathbb{R}$ such that ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|f\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}<\infty$ and let $P=P_{\theta}$ for some fixed $\theta$ , with invariant measure $\mu^{\ast}=\mu^{\ast}_{\theta}$ . Let $h=\mu^{\ast}(f)$ . Then, the Poisson equation

[TABLE]

has a unique solution $u(\cdot)$ up to an additive constant. Henceforth, we shall consider the particular solution

[TABLE]

This is well-defined, in fact the right hand side is absolutely convergent, and in addition $\mu^{*}(u)=0$ . Furthermore,

[TABLE]

implying the inequality

[TABLE]

where $K_{u}$ is given by

[TABLE]

Proof.

It is immediate to check that (35) is formally satisfied by $u$ . We show that $u$ is well-defined. First, consider any function $\varphi$ such that ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\varphi\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}\leq 1$ . By the definition of the metric $\sigma_{\beta}$ , see (25), the inequality

[TABLE]

holds true for any pair of probability measures $\mu_{1},\mu_{2},$ or even for any pair of signed measures $\mu_{1},\mu_{2}$ as in Definition 4. On the other hand, any generic function $\varphi$ can be rescaled by $\frac{1}{{\left|\kern-0.75346pt\left|\kern-0.75346pt\left|\varphi\right|\kern-0.75346pt\right|\kern-0.75346pt\right|}_{\beta}}$ , so that we also have

[TABLE]

To estimate the $n$ th term of the right hand side of (36), consider the equalities

[TABLE]

Using (41), we can bound the right hand side by $\sigma_{\beta}(P^{n}\delta_{x},P^{n}\mu^{\ast})$ . Now applying Proposition 2 and taking into account Corollary 1, we can further bound it by

[TABLE]

Taking into account the trivial estimate

[TABLE]

and noting that $\|\varphi\|_{\beta}\leq 1$ implies for all $x$ that $|\varphi(x)|\leq 1+\beta V(x)$ , we conclude that

[TABLE]

It follows that the series $\sum_{n=0}^{\infty}(P^{\ast n}f(x)-h)$ is absolutely convergent, so $u(x)$ is well-defined and satisfies the desired upper bound. Indeed, $({P^{\ast}}u)(x)$ can be written as

[TABLE]

where the integration and the summation can be interchanged due to the Lebesgue dominated convergence theorem, the conditions of which are ensured by (45). Thus, we get

[TABLE]

which implies the claim. Using similar arguments we get that

[TABLE]

To prove uniqueness, assume that there are two solutions $u_{1}$ and $u_{2}$ , and define $\Delta u=u_{2}-u_{1}$ . Then, $(I-{P^{\ast}})\Delta u=0$ , implying ${P^{\ast}}\Delta u=\Delta u$ , from which ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{P^{\ast}}\Delta u\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}={\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\Delta u\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}$ . But, by Proposition 1, it holds that ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|{P^{\ast}}\Delta u\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}\leq\alpha{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\Delta u\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta},$ and hence ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\Delta u\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}=0$ . Therefore, $\Delta u$ is a constant.

Summing the inequalities (45) over $n$ and using (11) we get the upper-bound

[TABLE]

from which the claim of the theorem follows. ∎

Now we consider a parametric family of kernels $(P_{\theta})$ and that of functions $(f_{\theta})$ for $\theta\in\Theta.$ A critical point in the discussion to follow is to define appropriate smoothness conditions for them in the context of [7].

Assumption 3 (Lipschitz Continuity of $P_{\theta}$ ).

There exists a constant $L_{P}$ such that for every $\theta,\theta^{\prime}\in\Theta$ and all $x\in\mathbb{X}$ it holds that

[TABLE]

It is easily seen that the validity of Assumption 3 implies an equality similar to (50) for general (non-negative) measures, even under a relaxed drift condition:

Lemma 2.

Let $\mu$ be a measure such that $\mu(1+\beta V)<\infty,$ and assume that the one-step growth condition, defined by Assumption 1 without requiring $\gamma<1,$ holds. Then under Assumption 3 we have for every $\theta,\theta^{\prime}\in\Theta,$

[TABLE]

Proof.

Assumption 3 implies that for all $\varphi$ such that $\|\varphi\|_{\beta}\leq 1$ , and hence also ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\varphi\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}\leq 1$ ,

[TABLE]

Integrating this inequality with respect to $\mu(\mathrm{d}x)$ on the right hand side of (52) we get the right hand side of (51). For integral of the left hand side we apply Fubini’s theorem to get

[TABLE]

where the measure $\eta=P_{\theta}\mu$ is defined as usual by $\eta(A)=\int_{\mathbb{X}}P_{\theta}(x,A)\mu(\mathrm{d}x)$ . The measure $\eta$ is well-defined, since $\mu(\mathbb{X})<\infty$ . The applicability of Fubini’s theorem is justified by the inequality

[TABLE]

and noting that the right hand side has a finite integral with respect to $\mu$ . Using the same argument for $\theta^{\prime}$ , altogether for the integral of (52) we obtain

[TABLE]

Since $\varphi$ is arbitrary subject to $\|\varphi\|_{\beta}\leq 1$ , we conclude that $\sigma_{\beta}(P_{\theta}\mu,P_{\theta^{\prime}}\mu)$ is bounded by the right hand side of (52), and we get the statement of the Lemma. ∎

The above lemma is easily extended from measures to signed measures:

Lemma 3.

Let $\eta$ be a signed measure such that $|\eta|(1+\beta V)<\infty,$ and assume that the one-step growth condition, defined by Assumption 1 without requiring $\gamma<1,$ holds. Then under Assumption 3 we have for every $\theta,\theta^{\prime}\in\Theta,$

[TABLE]

Proof.

We consider the Hahn-Jordan decomposition $\eta=\eta^{+}-\eta^{-}$ , where $\eta^{+}$ and $\eta^{-}$ are non-negative measures. Then

[TABLE]

Using Lemma 2 for both terms we get the desired upper bound:

[TABLE]

which is the right hand side of (54). ∎

The class of measurable functions $\{f_{\theta}:{\mathbb{X}}\rightarrow\mathbb{R}\mid\theta\in\Theta\}$ is determined by the following assumption:

Assumption 4.

We have $K_{f}:=\sup_{\theta\in\Theta}{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|f_{\theta}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}<\infty$ , and there exists a constant $L_{f}$ such that, for all $\theta,\theta^{\prime}$ , it holds that

[TABLE]

The main result of the present paper is as follows, with the remainder of this section being devoted to its proof:

Theorem 2.

Let Assumptions 1, 2, 3 and 4 hold, and consider the parameter-dependent Poisson equation

[TABLE]

where $h_{\theta}=\mu^{\ast}_{\theta}(f_{\theta})$ . Then, $h_{\theta}$ is Lipschitz continuous in $\theta$ :

[TABLE]

and the family of solutions $u_{\theta}(x)=\sum_{n=0}^{\infty}(P_{\theta}^{\ast n}f_{\theta}(x)-h_{\theta})$ , ensured by Theorem 1, is Lipschitz continuous in $\theta$ :

[TABLE]

where $L_{u}$ is independent of $x$ . It follows that

[TABLE]

Briefly speaking we can say that $(I-P_{\theta})^{-1},$ as an operator mapping the space of functions $f_{\theta}$ satisfying Assumption 4, such that $\int f_{\theta}(x)\mu^{\star}_{\theta}(\mathrm{d}x)=0,$ into ${\cal L}_{V}$ is Lipschitz continuous.

Proof.

Consider the extended parametric family of Poisson equations, where $P^{\ast}$ and $f$ are independently parametrized, with the notation $h_{\theta,\psi}=\mu_{\theta}^{\ast}(f_{\psi}),$

[TABLE]

First, we prove that $h_{\theta,\psi}$ is Lipschitz continuous in $\theta$ and $\psi.$ Since $h_{\theta}=\mu_{\theta}^{\ast}(f_{\theta})=h_{\theta,\theta}$ , the Lipschitz continuity of $h_{\theta},$ stated in (58) then follows. We can write

[TABLE]

Note that the limits of the right hand side are finite by Assumption 4 and the drift condition Assumption 1.

We can bound the right hand side of (62) as follows:

[TABLE]

Using the Lipschitz continuity of $f$ , as given by Assumption 4, the right hand side can bounded from above by

[TABLE]

Letting $n\to\infty$ , for the limit we get, using Remark 2,

[TABLE]

To continue the proof of the theorem we will have to establish the Lipschitz continuity of the powers of the kernel $P^{n}_{\theta}$ together with an upper bound for the Lipschitz constants.

Lemma 4.

Assume that Assumptions 1, 2, and 3 hold. Then for all $\theta,\theta^{\prime}\in\Theta$ and signed measure $\eta$ with $|\eta|(1+\beta V)<\infty$ ,

[TABLE]

where $C_{P}$ is independent of $\theta$ , $\theta^{\prime}$ and $\eta$ , given by

[TABLE]

Proof.

We can estimate $\sigma_{\beta}(P^{n}_{\theta}\eta,P^{n}_{\theta^{\prime}}\eta)$ from above, using a kind of telescopic sequence of triangular inequalities, leading to the upper bound

[TABLE]

Note that the measures $P_{\theta}P^{k}_{\theta^{\prime}}\eta$ and $P_{\theta^{\prime}}P^{k}_{\theta^{\prime}}\eta(\mathbb{X})$ are as in Definition 4, in particular $P_{\theta}P^{k}_{\theta^{\prime}}\eta(\mathbb{X})=P_{\theta^{\prime}}P^{k}_{\theta^{\prime}}\eta(\mathbb{X})$ . Then, using the contraction property of the kernels $P^{n-k-1}_{\theta}$ , see Proposition 2, we obtain the upper bound

[TABLE]

For the $k$ -th term apply Lemma 3 with $P^{k}_{\theta^{\prime}}\eta$ to get the following upper bound for (70):

[TABLE]

Note now that by the consequence of the drift condition given in Remark 1 we can bound $|P_{\theta}^{k}\eta|(V)$ for a general $\theta$ by

[TABLE]

Noting that $|P_{\theta}^{k-1}\eta|(\mathbb{X})\leq|\eta|(\mathbb{X})$ and iterating the above inequality, we get

[TABLE]

By plugging (73) into the sum in (71), we get the upper bound

[TABLE]

We can write the latter expression as

[TABLE]

Summarizing the inequalities (III) to (74), taking into account $\alpha>\gamma$ (see Remark 5), and bounding the geometric sums in (74) with their limit values we get the upper bound

[TABLE]

from which the claim follows by setting $n=0$ . ∎

Applying Lemma 4 for $\eta=\delta_{x}$ , we get

[TABLE]

from which we get the Lipschitz continuity of the invariant measure as a function of $\theta$ :

Corollary 2.

Under the assumptions of Lemma 4, we have

[TABLE]

where

[TABLE]

Proof.

Note that for any initial probability measure $\mu$ , we have by the triangle inequality

[TABLE]

The first and the last terms converge to zero by Proposition 2. Taking $\mu=\delta_{x}$ , the middle term is upper bounded by

[TABLE]

based on (67). Taking into account the definition of $C_{P}$ based on (75), this time taking $n\to\infty$ , we get $C^{\prime}_{P}$ . Finally, the claim follows by recalling that $\inf_{x}V(x)=0$ by Remark 3. ∎

Returning to the proof of Theorem 2, the left hand side of (63) can be written as

[TABLE]

Here ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|f_{\psi}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}\leq\sup_{\psi\in\Theta}{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|f_{\psi}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}=K_{f}<\infty$ by Assumption 4 and $\sigma_{\beta}(\mu_{\theta}^{*},\mu_{\theta^{\prime}}^{*})$ can be upper bounded by Corollary 2. Setting $\theta=\psi$ and combining the bounds given by (66) and (80) we get the desired inequality (58), $|h_{\theta}-h_{\theta^{\prime}}|\leq L_{h}|\theta-\theta^{\prime}|,$ with

[TABLE]

Next, we consider the Lipschitz continuity of the doubly-parametrized particular solution

[TABLE]

We will need an alternative of Lemma 4 for signed measures $\eta\in\mathcal{M}_{V}^{0}$ (i.e., $\eta(\mathbb{X})=0$ in addition to $|\eta|(1+\beta V)<\infty$ ):

Lemma 5.

Assume that Assumptions 1, 2, and 3 hold. Then for every $\theta,\theta^{\prime}\in\Theta$ and signed measure $\eta\in\mathcal{M}_{V}^{0}$ , we have

[TABLE]

Proof.

The starting point of the proof is the inequality, obtained by combining the inequalities (III) – (70), applicable also for signed meausures such that $|\eta|(1+\beta V)<\infty$ :

[TABLE]

A key point is the observation that since $\eta(\mathbb{X})=0$ , $P^{k}_{\theta^{\prime}}\eta$ converges exponentially fast to the zero measure, see Proposition 2. To estimate the $k$ th term of (84), we apply Lemma 3 and Remark 4, (30),

[TABLE]

Now applying Proposition 2 and Remark 4, (30), again, we get the upper bound:

[TABLE]

Inserting this into (84), we get the desired upper bound. ∎

Step 1. First we show that $u_{\theta,\psi}(x)$ is Lipschitz continuous in $\psi.$ Indeed, we have

[TABLE]

Here the $n$ -th term can be written, using (41), as

[TABLE]

Taking into account Proposition 2 and Assumption 4 the right hand side can be bounded from above by

[TABLE]

Inserting this into (III) gives

[TABLE]

Step 2. The critical point is to show that $u_{\theta,\psi}(x)$ is Lipschitz continuous in $\theta.$ Let us write

[TABLE]

The $n$ -th term can be written as

[TABLE]

Write the measure in the bracket as

[TABLE]

Then for $\Delta_{n,1}=\left(P_{\theta}^{n}-P_{\theta^{\prime}}^{n}\right)\left(\delta_{x}-\mu_{\theta}^{\ast}\right)$ we get by Lemma 5 with $\eta=\delta_{x}-\mu_{\theta}^{\ast}$ the upper bound

[TABLE]

On the other hand, for $\Delta_{n,2}=P_{\theta^{\prime}}^{n}\left(\mu_{\theta^{\prime}}^{\ast}-\mu_{\theta}^{\ast}\right)$ we have by Proposition 2 the upper bound $\sigma_{\beta}(\Delta_{n,2})\leq{\alpha^{n}}\sigma_{\beta}(\mu_{\theta^{\prime}}^{\ast}-\mu_{\theta}^{\ast}),$ and this can be bounded from above by Corollary 2, yielding

[TABLE]

Thus the $n$ -th term of (III), rewritten in (91), can be rewritten and bounded from above, using inequality (41), as

[TABLE]

Summation over $n,$ in view of (III), yields the upper bound

[TABLE]

The right hand side can be simplified to

[TABLE]

Combining this with (89), and setting $\theta=\psi$ we get the upped bound for $|u_{\theta,\theta}(x)-u_{\theta^{\prime},\theta^{\prime}}(x)|$ :

[TABLE]

The latter can be simplified to

[TABLE]

where

[TABLE]

To get a compact upper bound for (97), we replace $L_{u,1}$ by $L_{u,2}$ noting that $L_{u,1}\leq L_{u,2}$ , and multiply the second term by $(1+\beta V(x))\geq 1$ . Then, we get the upper bound

[TABLE]

proving the second claim of the theorem:

[TABLE]

where the constant $L_{u}$ can be chosen as

[TABLE]

∎

Remark 6.

Assume that the assumptions of Lemma 4 are satisfied. Then, for any measurable functions $\varphi$ such that ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\varphi\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}<\infty$ , $|P_{\theta}^{\ast n}\varphi(x)-P_{\theta^{\prime}}^{\ast n}\varphi(x)|$ is upper-bounded by

[TABLE]

for all $x$ , where $C_{P}$ is given in Lemma 4, see (68).

IV Relaxations of the Conditions

A delicate condition of Propositions 1-3 is Assumption 1, requiring the existence of a common Lyapunov function. This requirement may be too restrictive even in the case of linear stochastic systems, to be discussed in the example below.

Example. Consider a family of linear stochastic systems with state vectors $X_{\theta,n}$ defined by

[TABLE]

where $\theta\in\Theta$ , the matrix $A_{\theta}$ is stable for all $\theta\in\Theta$ , $(U_{n})$ is an i.i.d. sequence of random vectors such that $\mathbb{E}\,[U_{n}]=0$ and $\mathbb{E}\,[U_{n}U_{n}^{\top}]=S$ exists and is finite, and $B_{\theta}$ is a matrix with appropriate dimensions. Setting $V(x):=x^{\top}Qx$ , where $Q$ is a symmetric positive definite matrix, we have

[TABLE]

thus the drift condition is equivalent to $A_{\theta}^{\top}QA_{\theta}\leq\gamma\,Q$ with $\gamma<1$ for all $\theta$ , in the sense of semi-definite ordering. Hence, the matrix $Q$ induces a metric with respect to which $A_{\theta}$ is a contraction, simultaneously for all $\theta$ , with the same contraction factor. It follows that the family $A_{\theta},\theta\in\Theta$ is jointly stable.

Let us now assume only that $(A_{\theta}),\,\theta\in D$ , with $D\subset\Theta,$ is a compact set of stable matrices. Then we can find a positive integer $r$ such that $\|A_{\theta}^{r}\|\leq\gamma_{r}<1$ for all $\theta\in D,$ hence the family of matrices $A_{\theta}^{r},\theta\in D$ is jointly stable. This example motivates the following relaxation of the drift condition given as Assumption 1 in analogy with Assumption A’.5, (i) and (i’) on page 290 of [2]:

Assumption 5 (Uniform Drift Condition for $P_{\theta}^{r}$ ).

There exists a positive integer $r$ , a measurable function $V:\mathbb{X}\rightarrow[0,\infty)$ and constants $\gamma_{r}\in(0,1)$ and $K_{r}\geq 0$ such that for all $\theta\in\Theta$ and $x\in{\mathbb{X}},$ we have

[TABLE]

Assumption 6 (Uniform One Step Growth Condition for $P_{\theta}$ ).

With the same measurable function $V:\mathbb{X}\rightarrow[0,\infty)$ as above we have for all $\theta\in\Theta$ and all $x\in{\mathbb{X}}:$

[TABLE]

where we can and will assume that $\gamma_{1}>1$ and $K_{1}\geq 0.$

Lemma 6.

The uniform one-step growth condition given above implies that for any $\beta>0,$ for all functions $\varphi\in{\cal L}_{V}$ and all $\theta\in\Theta,$ with $\alpha^{\prime}=\gamma_{1}\vee(1+\beta K_{1}),$ we have

[TABLE]

Proof.

In order to simplify the notations we write $P_{\theta}=P.$ We have $|\varphi(x)|\leq\|\varphi\|_{\beta}(1+\beta V(x))$ from which we get

[TABLE]

by Assumption 6. The last term on the right hand side is majorized by $\alpha^{\prime}(1+\beta V(x))$ with $\alpha^{\prime}=\gamma_{1}\vee(1+\beta K_{1}),$ proving the first half of (108). To prove the second half of (108) recall that for any $\psi\in{\cal L}_{V}$ we have ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\psi\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}=\min_{c}\|\psi+c\|.$ Hence for any constant $c$ we have

[TABLE]

Apply the first inequality of (108) with $\varphi+c$ replacing $\varphi:$

[TABLE]

Choosing $c$ so that $\|{\varphi+c}\|_{\beta}={\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\varphi\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}$ yields the claim. ∎

Lemma 6 is a relaxed version of Proposition 1. Now, repeating the arguments leading to Proposition 2, we get:

Lemma 7.

Under Assumption 6, for any $\beta>0,$ for all $\theta\in\Theta$ , the kernel $P_{\theta}$ is a bounded linear operator on ${\cal M}_{V}^{0}$ , more exactly, for any $\eta\in{\cal M}_{V}^{0}$ , we have, with $\alpha^{\prime}$ as in Lemma 6,

[TABLE]

Alternatively, let $\eta_{1},\eta_{2}$ be two possibly signed measures on $\mathbb{X}$ as in Definition 4. Then we have

[TABLE]

Assumption 7 (Uniform Local Minorization for

$P_{\theta}^{r}$ ).

Let $R_{r}>2K_{r}/(1-\gamma_{r}),$ where $\gamma_{r}$ and $K_{r}$ are the constants from Assumption 5, and let ${\cal C}_{r}=\{x\in{\mathbb{X}}:V(x)\leq R_{r}\}$ . There exist a probability measure $\bar{\mu}_{r}$ and a constant $\bar{\alpha}_{r}\in(0,1)$ such that for all $\theta\in\Theta$ , $x\in{\cal C}_{r}$ and measurable $A$ it holds:

[TABLE]

The main results cited in Section II can be extended, with minor modifications, assuming the above relaxed conditions. Proposition 1 can be restated as follows:

Theorem 3.

Under Assumptions 5, 6 and 7 there exist constants $\beta=\beta_{r}>0$ , $\alpha\in(0,1)$ and $C>0$ such that for any $\varphi\in{\cal L}_{V},$ all $\theta\in\Theta$ and all $n>0$ we have:

[TABLE]

Here we can choose $\beta=\beta_{r}$ , given by Proposition 1 applied to $P_{\theta}^{r}$ , $\alpha=\alpha_{r}^{1/r},$ with $0<\alpha_{r}<1$ provided by Proposition 1 applied to $P_{\theta}^{r},$ and $C=\alpha_{r}^{-1}(\alpha^{\prime})^{r-1},$ with $\alpha^{\prime}$ as in Lemma 6.

Proof.

Let us fix a $\theta\in\Theta$ and write $P=P_{\theta}$ . By Proposition 1 there exist $\beta=\beta_{r}>0,$ and $\alpha_{r}\in(0,1)$ such that ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|P^{\ast r}\varphi\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}\leq\alpha_{r}{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|\varphi\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta},$ implying for any positive integer $m$

[TABLE]

For a general positive integer $n$ write $n=rm+k$ with $0\leq k\leq r-1$ . Then, we get

[TABLE]

To complete the proof apply the second inequality of (108) $k\leq r-1$ times to obtain

[TABLE]

Now $m=(n-k)/r>n/r-1,$ hence $\alpha_{r}^{m}<\alpha_{r}^{n/r}\alpha_{r}^{-1},$ thus the claim of the theorem follows. ∎

Proposition 2 takes now the following modified form:

Theorem 4.

Under Assumptions 5, 6 and 7 there exist constants $\beta=\beta_{r}>0,$ $\alpha\in(0,1)$ and $C>0$ such that for any signed measure $\eta\in{\cal M}_{V}^{0}$ , all $\theta$ and $n\geq 0$ we have

[TABLE]

The constants $\beta=\beta_{r},$ $\alpha$ and $C$ are the same as in Theorem 3. Alternatively, let $\eta_{1},\eta_{2}$ be two possibly signed measures on $\mathbb{X}$ as in Definition 4. Then we have

[TABLE]

Finally, we have the following extension of Proposition 3:

Theorem 5.

Under Assumptions 5, 6 and 7 for all $\theta\in\Theta$ there exists a unique probability measure $\mu_{\theta}^{\ast}$ on $\mathbb{X}$ such that $\mu_{\theta}^{\ast}(V)<\infty$ and $P_{\theta}\mu_{\theta}^{\ast}=\mu_{\theta}^{\ast}.$ Denoting the unique invariant probability measure for $P_{\theta}^{r}$ by $\mu_{\theta,r}^{\ast}$ we have $\mu_{\theta}^{\ast}=\mu_{\theta,r}^{\ast}.$

Proof.

Let us fix any $\theta\in\Theta$ and write $P=P_{\theta}$ , $\mu^{\ast}=\mu_{\theta}^{\ast}$ and $\mu_{r}^{\ast}=\mu_{\theta,r}^{\ast}.$ Thus $\mu^{\ast}_{r}$ is the unique invariant probability measure for $P^{r}$ the existence of which is ensured by Proposition 3. Now, we show that for any $k>0$ we have $\int_{\mathbb{X}}V\,\mathrm{d}P^{k}\mu^{\ast}_{r}:=\int_{\mathbb{X}}V(x)\,P^{k}\mu^{\ast}_{r}(\mathrm{d}x)<\infty$ . Indeed, write:

[TABLE]

Here the r.h.s. can be bounded from above, using the definition of $\|\cdot\|_{\beta}$ and the first half of (108), by

[TABLE]

which is finite since $\int_{\mathbb{X}}V\,\mathrm{d}\mu^{\ast}_{r}<\infty.$ It follows that the probability measure $\mu^{*}$ defined by

[TABLE]

also satisfies $\int_{\mathbb{X}}V\,\mathrm{d}\mu^{*}<\infty,$ and it is readily seen to be invariant for $P.$ Since any probability measure invariant for $P$ is also invariant for $P^{r},$ there cannot be measures that are invariant for $P$ besides $\mu^{\ast}_{r}$ , and thus we have $\mu^{\ast}=\mu^{\ast}_{r}$ . ∎

The main results of Section III can now be extended, with minor modifications, assuming the above relaxed conditions.

Theorem 6.

Assume that the kernels $(P_{\theta})$ satisfy Assumptions 5, 6 and 7. Let $\beta=\beta_{r}>0$ be as given in Theorem 3, let us fix any $\theta\in\Theta$ and write $P=P_{\theta}.$ Let $f:{\mathbb{X}}\rightarrow\mathbb{R}$ be a measurable function such that ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|f\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}<\infty.$ Let $\mu^{\ast}$ denote the unique invariant probability measure of $P,$ and let $h=\mu^{\ast}(f).$ Then, the Poisson equation

[TABLE]

has a unique solution $u$ up to additive constants. The particular solution for which $\mu^{*}(u)=0$ can be written as

[TABLE]

where the right hand side is absolutely convergent, and

[TABLE]

for some constant $K_{r,u}>0$ depending only on the constants appearing in Assumptions 5, 6 and 7. It also follows:

[TABLE]

Proof.

Consider the Poisson equation

[TABLE]

where $h=\mu^{\ast}(f),$ recalling that $\mu^{\ast}_{r}=\mu^{\ast}.$ In view of Theorem 1 it has a unique solution, up to an additive constant. The particular solution with $\mu^{*}(v)=0$ can be written as

[TABLE]

which is well-defined, the r.h.s. is absolutely convergent, and

[TABLE]

implying the inequality

[TABLE]

where $K_{r,v}$ is given by

[TABLE]

The solution of (127) is related to that of (123) by noting that

[TABLE]

It follows that

[TABLE]

is a solution of (123). Incidentally, this is also seen from the series expansion (124). To get an upper bound for $|u(x)|$ , write

[TABLE]

Taking into account the upper bound for $|v(x)|$ given in (129), it is seen that it is sufficient to derive upper bounds for $(P^{k}\delta_{x})(V)=P^{\ast k}V(x)$ for $k=1,\ldots,r-1.$

Now recall that in view of the one-step growth condition and Remark 1, we have $P\mu(V)\leq\gamma_{1}\mu(V)+K_{1}$ , for any probability measure $\mu$ with $\mu(V)<\infty$ . By repeated application of this inequality, as in the derivation of (73), we obtain

[TABLE]

Choosing $\mu=\delta_{x}$ and summing over $k$ from [math] to $r-1$ we get

[TABLE]

The right hand side is bounded from above by

[TABLE]

Combining these inequalities with (129) and (133) we get

[TABLE]

implying the upper bound of the form given in (125).

As for uniqueness, assume that there are two solutions $u_{1},u_{2}\in\mathcal{L}_{V}$ , and let $\Delta u=u_{2}-u_{1}$ . Then, $(I-{P^{\ast}})\Delta u(x)=0$ for all $x$ , implying ${P^{\ast}}\Delta u=\Delta u$ . Iterating this $r-1$ times we get $P^{*r}\Delta u=\Delta u,$ and by Theorem 1 we conclude that $\Delta u$ is a constant function, thus completing the proof. ∎

The extension of Theorem 2, on the Lipschitz continuity of $u_{\theta}(\cdot)$ , seems to be straightforward. However, we should point out that we have to assume the Lipschitz continuity of the one-step kernels $(P_{\theta}),$ as given in Assumption 3.

Theorem 7.

Assume that the kernels $(P_{\theta})$ satisfy Assumptions 5, 6 and 7. Let us fix $\beta=\beta_{r}>0$ as given in Theorem 3. In addition assume that the family of one-step kernels $(P_{\theta})$ is Lipschitz continuous is the sense of Assumption 3. Let $(f_{\theta})$ be a family of ${\mathbb{X}}\rightarrow\mathbb{R}$ measurable functions such that Assumption 4 holds. Let $\mu_{\theta}^{\ast}$ denote the unique invariant probability measure of $P_{\theta}$ and let $h_{\theta}=\mu_{\theta}^{\ast}(f_{\theta}).$ Consider the Poisson equations

[TABLE]

Then, $h_{\theta}$ is Lipschitz continuous in $\theta$ :

[TABLE]

and the particular solution

[TABLE]

is well-defined for all $\theta$ , and is Lipschitz continuous in $\theta$ :

[TABLE]

Alternatively, we can write

[TABLE]

Here the constants $L_{r,h}$ and $L_{r,u}$ depend only on the constants appearing in Assumptions 3, 4, 5, 6, and 7.

For the proof we need a simple extension of Lemma 4:

Lemma 8.

Assume that $(P_{\theta}),\theta\in\Theta,$ satisfies the uniform one-step growth condition, Assumption 6, and the Lipschitz continuity condition Assumption 3 with some $\beta>0.$ Then for any pair $\theta,\theta^{\prime}\in\Theta,$ for any signed measure $\eta$ satisfying $|\eta|(1+\beta V)<\infty$ , and for any $\alpha^{\prime\prime}>\alpha^{\prime}:=\max(1+\beta K_{1},\gamma_{1})$

[TABLE]

for all $n>0$ , where $L_{P}^{\prime\prime}$ depends only on the constants appearing in the conditions of the lemma and on $\alpha^{\prime\prime}$ .

Proof.

The proof is obtained by a simple modification of the proof of Lemma 4. We can estimate $\sigma_{\beta}(P^{n}_{\theta}\eta,P^{n}_{\theta^{\prime}}\eta)$ using a sequence of triangular inequalities to get

[TABLE]

Consider the $k$ th term and apply Lemma 7 repeatedly $n-k-1$ times setting $\eta_{1}=P_{\theta}P^{k}_{\theta^{\prime}}\eta$ and $\eta_{2}=P^{k+1}_{\theta^{\prime}}\eta$ :

[TABLE]

Note that the conditions of Lemma 7 are satisfied for $\eta_{1},\eta_{2}$ : obviously $\eta_{1}(\mathbb{X})=\eta_{2}(\mathbb{X})=\eta(\mathbb{X})$ and $|\eta_{i}|(1+\beta V)<\infty$ , for $i=1,2$ due to the repeated application of the one-step growth condition.

Combining the last two inequalities we get:

[TABLE]

Consider the $k$ -th term, and apply the Lipschitz continuity of $(P_{\theta}),$ Assumption 3, implying Lemma 3. Applying the latter for the signed measure $P^{k}_{\theta^{\prime}}\eta$ we get the upper bound

[TABLE]

To estimate $|P_{\theta^{\prime}}^{k}\eta|(V)$ , we invoke (73) with $\gamma_{1}$ replacing $\gamma$ :

[TABLE]

By plugging this into (146), we get the upper bound

[TABLE]

The first term in the above sum is bounded from above by $(\alpha^{\prime})^{n}/(\alpha^{\prime}-1).$ The second term can be written as

[TABLE]

Recall that each of the $n$ terms of the convolution of the sequences $((\alpha^{\prime})^{k})$ and $(\gamma_{1}^{k})$ can be estimated from above by $C(\alpha^{\prime\prime})^{n-1}$ for any $\alpha^{\prime\prime}>\max(\alpha^{\prime},\gamma_{1})=\alpha^{\prime},$ where $C$ depends only on $\alpha^{\prime}$ , $\gamma_{1}$ and $\alpha^{\prime\prime}.$ Summarizing the inequalities (145) to (IV), and the arguments that follow, we get the claim. ∎

Proof of Theorem 7.

First, note that $\mu_{\theta}^{\ast}=\mu_{\theta,r}^{\ast}$ implies that

[TABLE]

Applying Theorem 2, for the Poisson equation

[TABLE]

we conclude that $h_{\theta}$ is Lipschitz continuous in $\theta$ :

[TABLE]

where $L_{r,h}$ is given, according to (81), by

[TABLE]

where $L_{P^{r}}$ is the Lipschitz-constant for the kernels $(P^{r}_{\theta}),$ as defined in Assumption 3, and $\beta$ and $\alpha_{r}$ are chosen as in Theorem 3.

In order to prove the second part of Theorem 7, note that, in view of Theorem 2, the particular solution given by $v_{\theta}(x)=\sum_{n=0}^{\infty}P_{\theta}^{\ast nr}(f_{\theta}(x)-h_{\theta})$ is Lipschitz continuous w.r.t. $\theta$ , and

[TABLE]

where $L_{u}:=L_{r,v}$ is defined in (102), which now becomes

[TABLE]

It follows that the specific solution of Poisson equation (138):

[TABLE]

is also Lipschitz continuous in $\theta.$ Indeed, for $1\leq m\leq r-1$

[TABLE]

and the first term on the r.h.s. is bounded from above as

[TABLE]

for all $x$ . Applying (IV) with $\eta=\delta_{x}$ we get the upper bound

[TABLE]

For the second term on the right hand side of (155) first we note that ${\left|\kern-1.07639pt\left|\kern-1.07639pt\left|v_{\theta^{\prime}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}<\infty$ by (130) in the proof of Theorem 6:

[TABLE]

for some constant $K_{r,v}>0$ depending only on the constants appearing in Assumptions 5, 6 and 7. Now we can write

[TABLE]

Applying Lemma 8 we get that the right hand side of (158) is bounded by

[TABLE]

where $K_{r,v}$ is defined by (131). Recall that $\sup_{\theta^{\prime}}{\left|\kern-1.07639pt\left|\kern-1.07639pt\left|f_{\theta^{\prime}}\right|\kern-1.07639pt\right|\kern-1.07639pt\right|}_{\beta}=K_{f}<\infty$ by Assumption 4. Taking into account the representation of $u_{\theta}(x)$ given in (154), and the decomposition given in (155), and adding the upper bounds (156) and (159) for $m=0,\dots,r-1$ , we get the second claim. ∎

Remark 7.

A nice corollary of Lemma 8 is the Lipschitz continuity of the probability transition kernel of the sampled process, with the first sample after time [math] taken at time $k$ with probability $(1-\varepsilon)\varepsilon^{k},$ given by the resolvent

[TABLE]

It is easily seen that, using the notations and assumptions of Lemma 8, we have for any $0<\varepsilon<1/\alpha^{\prime\prime}$

[TABLE]

We conclude this section by giving a simple criterion for the verification of the uniform drift condition for the $r$ -frame process, as given in Assumption 5. Motivated by the example of linear stochastic systems, given in Section IV, requiring only individual stability of the matrices $A_{\theta}$ , we propose the following condition in terms of the one-step kernels $(P_{\theta})$ :

Assumption 8 (Individual Drift Conditions).

There exists a family of measurable functions $V_{\theta}:\mathbb{X}\rightarrow[0,\infty)$ and universal constants $\gamma\in(0,1)$ and $K\geq 0$ such that

[TABLE]

for all $x$ and $\theta$ . Moreover, there exists a measurable $V:\mathbb{X}\rightarrow[0,\infty)$ and constants $a,b,c,d$ with $a,c>0$ such that

[TABLE]

for all $x$ and $\theta$ .

Lemma 9.

Under Assumption 8, for any sufficiently large $r$ , the uniform drift condition for $P_{\theta}^{\ast r}$ and the one-step growth condition, Assumptions 5 and 6, are satisfied with $V.$

Proof.

Since $V(x)\leq(V_{\theta}(x)-b)/a$ , for any $r\geq 1$ :

[TABLE]

Choosing $r$ so that $\gamma^{r}\frac{c}{a}<1,$ the uniform drift condition holds for $(P_{\theta}^{\ast r})$ w.r.t. $V$ . In order to prove the uniform one-step growth condition, take $r=1$ in (IV) and set $\gamma_{1}=\gamma c/a.$ ∎

V Controlled Queues

In this section we return to our prime example given in Section II. We will show that under reasonable additional conditions on a controlled queue Assumptions 1, 3, 7 are satisfied with $V(x):=e^{\chi x}$ , $x\in\mathbb{R}_{+},$ with $\chi>0$ small enough, when $\theta$ is restricted to compact set $D\subset\Theta$ . Thus the results of the paper, in particular Theorems 5, 6, 7, imply the existence, uniqueness and Lipschitz continuity of the solution $u_{\theta}(x)$ of the parameter-dependent Poisson equation

[TABLE]

where the normalizing constant $h_{\theta}$ is the expectation of $f_{\theta}(x)$ under the (unique) invariant measure, when $\theta$ is restricted to an open set $\Theta^{\prime}\subset D$ in place of $\Theta$ .

The conditions below will be given in terms of the r.v. $U_{\theta,1}$ thus ensuring the generality of our results. Specific conditions in terms of $S_{\theta,0}$ and $T_{\theta,1}$ will be given at the end of the section. To guarantee stability of the system (8) we stipulate:

Assumption 9.

We have $\mathbb{E}[\,U_{\theta,1}\,]<0$ , for all $\theta\in D.$

This is a standard condition implying stability of the queue for any fixed $\theta$ in a variety of interpretations, see e.g., [13, 3] and [5]. A further standard condition in queuing theory, and also in the area of risk processes [17], is the existence of a finite positive exponential moment of $S_{\theta,n}-T_{\theta,n+1},$ or equivalently that of $U_{\theta,n}.$ A uniform version of this condition in terms of $U_{\theta,1}$ is given below:

Assumption 10.

We have $\sup_{\,\theta\in D}\mathbb{E}[\,\exp(\eta\,U_{\theta,1})\,]<\infty$ , for some $\eta>0$ .

Observe that Assumption 10 is automatically satisfied if $\sup_{\,\theta\in D}\mathbb{E}[\,\exp(\eta\,S_{\theta,0})\,]<\infty.$ Finally, we will need the following continuity condition for $U_{\theta,1}$ :

Assumption 11.

The probability distribution function of $U_{\theta,1}$ is weakly continuous in $\theta$ for $\theta\in D,$ i.e. $\mathbb{E}[\,f(U_{\theta,1})\,]$ is continuous in $\theta$ for all bounded continuous function $f$ .

We note in passing that these three assumptions imply that the stability condition, Assumption 9, is satisfied uniformly in $\theta$ for $\theta\in D:$

[TABLE]

Uniform Drift Condition. The validity of the uniform drift condition, given as Assumption 1, with no parameter-dependence, has been established using the Lyapunov function $V(x):=e^{\chi x}$ , $x\in\mathbb{R}_{+},$ with $\chi$ small enough, see e.g., Section 16.4 of [14]. (We should note that the use of exponential moments is also a standard tool in the theory of risk processes, see [17]). For the sake of completeness and further reference we restate this result, and provide its proof.

Lemma 10.

Let us assume that $U$ is an $\mathbb{R}$ -valued random variable such that $\mathbb{E}\,U<0,$ and for some $\eta>0$ we have ${\mathbb{E}}\,[\,e^{\eta\,U}\,]<\infty.$ Then there exist $0<\chi_{0}<\eta$ such that for all $0<\chi\leq\chi_{0}$ there exists $0<\gamma<1,$ such that

[TABLE]

Proof.

Since the function $g(\chi):={\mathbb{E}}\,[\,e^{\chi U}\,]$ is convex in $\chi,$ the finite difference quotients are monotone non-increasing for $\chi\downarrow 0$ with negative limit:

[TABLE]

Hence there exists a $0<\chi_{0}<\eta$ and some $\varepsilon>0$ such that for all $0<\chi\leq\chi_{0}$

[TABLE]

It follows that ${\mathbb{E}}\,[\,e^{\chi U}\,]\leq 1-{\chi}\varepsilon,$ and thus we get

[TABLE]

∎

A minor, but essential technical extension of the above arguments, in order to derive a uniform version of the drift condition, is stated in the lemma below:

Lemma 11.

Let $U(\theta):=U_{\theta,1},\,\theta\in D,$ be a family of $\mathbb{R}$ -valued random variables, satisfying Assumptions 9, 10 and 11. Then there exist $0<\chi_{0}<\eta$ such that for all $0<\chi\leq\chi_{0}$ there exists $0<\gamma<1,$ such that for all $\theta\in D$

[TABLE]

Remark 8.

Taking into account the proof of Lemma 10 it is clear that all we need to show to prove Lemma 11 is that a uniform version of (169) is valid, i.e. that there exists a $0<\chi_{0}<\eta$ and some $\varepsilon>0$ such that for all $0<\chi\leq\chi_{0}$ and for all $\theta\in D$

[TABLE]

Let us define the family of functions $g(\chi,\,\theta):={\mathbb{E}}\,[\,e^{\chi U(\theta)}\,].$ By Assumption 10 it is readily seen that the random variables $e^{\chi U(\theta)}$ are uniformly integrable for $\chi<\eta$ , and, therefore, by Assumption 11 it follows that $g(\chi,\,\theta)$ is continuous in $\theta.$ The desired claim (172) now follows from the following lemma formulated in the context of convex analysis:

Lemma 12.

Let $g(\chi,\,\theta)$ be a family of convex functions in the variable $\chi$ with $0\leq\chi<\eta$ and $\theta\in D\subset\mathbb{R}^{k}$ with $D$ being a compact set, such that

•

$g(0,\,\theta)=1$ * for all $\theta\in D,$ *

•

for all fixed $\theta\in D$ we have

[TABLE]

•

for all fixed $0\leq\chi<\eta$ the function $g(\chi,\,.\,)$ is continuous in $\theta.$

Then there exists $\chi_{0}>0$ such that for $0<\chi\leq\chi_{0}$ we have

[TABLE]

It follows that $\sup_{\theta\in D}\,g(\chi,\,\theta)<1$ for $0<\chi\leq\chi_{0}.$

Remark 9.

It also readily follows that

[TABLE]

Proof of Lemma 12.

Let us define the function

[TABLE]

for $0\leq\chi<\eta.$ Obviously, $g(\cdot)$ is convex and $g(0)=1.$ The claim of the lemma can be then restated as saying that there exists $\chi_{0}>0$ such that for $0<\chi\leq\chi_{0}$ we have

[TABLE]

Assume that the claim is not true, and let $\chi_{n}\downarrow 0$ be a monotone sequence such that we have

[TABLE]

Let $\theta_{n}\in D$ be such that

[TABLE]

Due to the compactness of $D$ we can assume that $\theta_{n}\in D$ has a limit in $D,$ say $\lim\theta_{n}=\theta^{*}\in D.$ Consider now the function $g(\cdot\,,\theta^{*})$ and choose a $\chi_{0}$ such that

[TABLE]

The continuity of $g(\chi_{0},.\,)$ in $\theta$ implies $g(\chi_{0},\theta_{n})\leq 1-c/2<1$ for sufficiently large $n$ . On the other hand the convexity of the function $g(.\,,\theta_{n}),$ and $g(0\,,\theta_{n})=1$ and $g(\chi_{n}\,,\theta_{n})\geq 1$ imply that for $\chi_{0}>\chi_{n}$ we have $g(\chi_{0},\theta_{n})\geq 1,$ a contradiction, proving the claim. ∎

Local Minorization for $P_{\theta}^{r}(x,\,.)$ . As for the local minorization condition it will be verified in a form slightly stronger than our Assumption 7, by showing that for any fixed $R>0$ there exists some integer $r\geq 1,$ which may depend on $R,$ such that

[TABLE]

Indeed, the above inequality implies Assumption 7. To see this, take an arbitrary $R_{r}$ , as in Assumption 7, satisfying $R_{r}>2K_{r}/(1-\gamma_{r}).$ Then the set

[TABLE]

is of the form $\{0\leq x\leq R\}$ with some $R$ . Letting $\bar{\mu}_{r}$ denote the probability measure assigning unit mass to [math] inequality (179) implies $P_{\theta}^{r}(x,A)\geq\bar{\alpha}_{r}\bar{\mu}_{r}(A)$ with some $\bar{\alpha}_{r}\in(0,1)$ for all $\theta\in\Theta$ and $x\in{\cal C}_{r},$ as postulated by (113).

To prove (179) we will use arguments familiar in queuing theory, but to establish uniform bounds extra care is needed. Consider first a fixed $\theta\in D,$ and let $R>0$ be any fixed real number, defining the bounded $[0,R].$ Let $0\leq x\leq R$ and consider the $r$ -step transition probability $P_{\theta}^{r}(x,\{0\})=P(W_{\theta,r}(x)=0)$ for some integer $r\geq 1$ .

Lemma 13.

There is $\epsilon>0$ such that

[TABLE]

Proof.

By Lemma 12, it follows that for sufficiently small $\chi$ we have $A:=\sup_{\theta\in D}\mathbb{E}[e^{\chi U_{\theta,1}}]<1$ hence

[TABLE]

which is strictly smaller than $1$ for $\epsilon$ small enough. ∎

Corollary 3.

For each $R>0$ there is $r\in\mathbb{N}$ such that

[TABLE]

Proof.

Indeed, let $\epsilon,\upsilon$ be as in Lemma 13 and choose $r$ so large that $r\epsilon>R$ . Then, for all $\theta\in D$ and $0\leq x\leq R$ , $P(W_{\theta,r}(x)=0)$ is bounded from below by

[TABLE]

∎

A nice additional result is the following: let $W_{\theta,s}^{*}$ denote the stationary solution of the queue dynamics given by

[TABLE]

Then

[TABLE]

where $C_{r}$ tends to $1$ exponentially fast when $r$ tends to $\infty$ . Moreover,

[TABLE]

Lipschitz Continuity of $P_{\theta}$ . In order to verify Assumption 3 we will need to strengthen our assumptions on $S_{\theta,0}$ and $T_{\theta,1}$ . We may consider various scenarios, briefly discussed below, both of them implying a common condition on $U_{\theta,1}$ as follows:

Assumption 12.

The probability distribution function of $U_{\theta,1}$ has a density function for all $\theta\in D,$ denoted by $\zeta_{\theta}(\cdot)$ There exist $\eta^{\prime\prime}>0$ and $C^{\prime\prime}>0$ such that for all $\theta,\,\theta^{\prime}\in\Theta$ and all $x\in\mathbb{R}$ , and it holds that

[TABLE]

Assumption 12 can be conveniently verified by imposing the following assumption on $S_{\theta,0}$ and $T_{\theta,1}:=T_{1}$ , when $T_{\theta,1}$ is assumed to be independent of $\theta:$ the probability distribution functions of $S_{\theta,0}$ has a density function for all $\theta\in D,$ denoted by $\xi_{\theta}(\cdot),$ and there exist $C^{\prime},\eta^{\prime}>0$ such that for all $\theta,\theta^{\prime}\in\Theta$ and all $x\in\mathbb{R}_{+}$ , it holds that $|\xi_{\theta}(x)-\xi_{\theta^{\prime}}(x)|\leq C^{\prime}e^{-\eta^{\prime}x}|{\theta}-{\theta^{\prime}}|.$ Moreover the probability distribution function of $T_{1}$ has a density function denoted by $\kappa(\cdot)$ such that for all $x\in\mathbb{R}_{+}$ it holds that $\kappa(x)\leq C^{\prime}e^{-\eta^{\prime}x}.$

The first part of the above auxiliary assumption can be conveniently checked by requiring the existence of a density function $\xi_{\theta}(\cdot)$ such that the mapping $(\theta,x)\to\xi_{\theta}(x)\in\mathbb{R}_{+}$ is measurable, and for each fixed $x$ continuously differentiable in $\theta\in\Theta$ , moreover there exist $C^{\prime},\eta^{\prime}>0$ such that for all $\theta\in\Theta$ and all $x\in\mathbb{R}_{+}$ it holds that $\|{\frac{\partial}{\partial{\theta}}}\xi_{\theta}(x)\|\leq C^{\prime}e^{-\eta^{\prime}x}.$ The proof is readily obtained by the mean-value theorem. Incidentally, it also follows that the law of $S_{\theta,0}$ is continuous in total variation, and, a fortiori, also weakly, implying Assumption 11.

A condition reciprocal to the above is obtained by interchanging the role of $S_{0}$ and $T_{1},$ i.e. assuming that $S_{0}$ does not depend on $\theta,$ while $T_{1}=T_{\theta,1}$ does. We note that in this case the requirement that the density function of $S_{0},$ denoted by $\xi(\cdot),$ satisfies $\xi(x)\leq C^{\prime}e^{-\eta^{\prime}x}$ for all $x\in\mathbb{R}_{+}$ implies Assumption 10 with any $\eta<\eta^{\prime}$ .

In order to verify Assumption 3, let us consider the transition probabilities $P_{\theta}(x,A)$ . For any Borel set $A\subset\mathbb{R}_{+},\,0\notin A$ we can write

[TABLE]

State $0,$ being an atom for $P_{\theta}(x,\,.),$ is reached with probability

[TABLE]

Lemma 14.

Let Assumption 12 hold, and let $V(x):=e^{\chi x}$ , $x\in\mathbb{R}_{+},$ with $\chi<\eta^{\prime\prime}.$ Then for all $\phi\in{\cal L}_{V}$ with $||\phi||_{\beta}\leq 1$ and all $\theta,\theta^{\prime}\in\Theta$ we have with some $L>0$

[TABLE]

Proof.

We can write

[TABLE]

First, for the regular part (first term on the r.h.s.) we have

[TABLE]

with some constant $C^{\prime\prime\prime}.$ On the other hand, for the atomic component (second term on the r.h.s.) we get

[TABLE]

Inequalities (191) and (V) imply the claim of the lemma. ∎

VI Discussion

Recursive identification of stochastic systems is a central problem of mathematical system theory, instrumental for a variety of fields such as adaptive control, adaptive signal processing and adaptive input design. A milestone in the development of a mathematically rigorous theory was the book [2], considering the abstract problem of solving an algebraic equation defined via a parameter-dependent Markov process, with an explicit link to stochastic approximation (SA). This paper is a significant addition to the theory developed in [2], inasmuch an alternative methodology is presented for the analysis of Poisson equations, associated with the Markov process in consideration. This apparently off-beat technical problem is in fact the fundamental tool in the ODE analysis of [2], taking up more than half of the efforts in proving its basic convergence results.

Our approach utilizes a novel and elegant stability theory for Markov chains, developed by Hairer and Mattingly [7], which allowed us to establish simple, transparent conditions to be imposed on the probability transition kernel that ensure the existence, uniqueness and Lipschitz continuity of the solution of the Poisson equation for a parametrized family of Markov chains. Possible relaxations of our core assumptions were also discussed. The power of the suggested framework was demonstrated via its applicability to a controlled queuing system. To complete the loop, the technology presented here will be applied for the ODE analysis recursive estimators along the lines of [2] in a forthcoming paper.

Now, we briefly discuss a limitation of our results, inherent due to the approach of [7], together with a potential ramification, further context w.r.t. previous works and applications.

Recalling that the conditions given in Section IV, formulated in terms of $P^{r}_{\theta}(x,A)$ were motivated by properties of linear stochastic systems, defined under (104), it is somewhat surprising that the results of the paper are not directly applicable for this class of models. The reason is that Assumption 3, requiring the Lipschitz continuity of the one-step transition probability kernel $P_{\theta}$ w.r.t. an extended total variation distance, will not be satisfied in general, since the probability measures of $A_{\theta}X_{\theta,n}+B_{\theta}U_{n}$ will be concentrated to different proper sub-spaces, and hence will be singular. An appropriate tool for circumventing this difficulty could be to resort to Wasserstein distance, and revisit the current paper relying on the methodology developed in [8].

An alternative set of conditions under which the problems of the paper may be worth studying is provided by the theory developed already in the first edition of [14], extended in later works, such as [9]. In this latter highly technical paper large deviation principles are derived for geometrically ergodic Markov chains. However, it is not clear how these techniques and results could be extended in a parameter-dependent context. In particular it is not clear, what the analogue of the pair of function spaces given in Assumption 4 and ${\cal L}_{V}$ would be, so that the Lipschitz continuity of $(I-P_{\theta})^{-1},$ acting on the subspace defined by $\int f_{\theta}(x)\mu^{\star}_{\theta}(\mathrm{d}x)=0,$ is implied.

Regarding applications, the analysis of large telecommunication systems requires the study of queueing networks with several servers and customers getting consecutive services, allowing various control actions such as call admission control, [10]. The extension of the results of Section V to queueing networks is an attractive, but challenging problem.

Finally, the recent intense interest in online machine learning, particularly in reinforcement learning where SA technologies are often adapted and for which the Markovian setup is the standard choice, is an add-on justification of the paper.

Bibliography17

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] K. B. Athreya and P. Ney. A new approach to the limit theory of recurrent Markov chains. Transactions of the American Mathematical Society , 245:493–501, 1978.
2[2] A. Benveniste, M. Métivier, and P. Priouret. Adaptive Algorithms and Stochastic Approximations . Springer, 2nd edition, 1990.
3[3] A. A. Borovkov. Egodicity and Stability of Stochastic Processes . Wiley & Sons, New York, 1998.
4[4] N. Brosse, A. Durmus, S. Meyn, É. Moulines, and A. Radhakrishnan. Diffusion approximations and control variates for MCMC. ar Xiv:1808.01665 , 2018.
5[5] P. Diaconis and D. Freedman. Iterated random functions. SIAM Review , 41(1):45–76, 1999.
6[6] L. Gerencsér. Rate of convergence of recursive estimators. SIAM Journal on Control and Optimization , 30(5):1200–1227, 1992.
7[7] M. Hairer and J. C. Mattingly. Yet another look at Harris’ ergodic theorem for Markov chains. In Seminar on Stochastic Analysis, Random Fields and Applications VI , pages 109–117. Springer, 2011.
8[8] M. Hairer, J. C. Mattingly, and M. Scheutzow. Asymptotic coupling and a general form of Harris’ theorem with applications to stochastic delay equations. Probability Theory and Related Fields , 149:223–259, 2011.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Poisson Equations, Lipschitz Continuity and Controlled Queues

Abstract

I Introduction

II A Brief Summary of

Assumption 1** (Uniform Drift Condition for PθP_{\theta}Pθ​).**

Remark 1**.**

Assumption 2** (Local Minorization).**

Remark 2** (Interpretation of RRR).**

Remark 3** (Constant Shifts of VVV).**

Definition 1**.**

Definition 2**.**

Definition 3**.**

Lemma 1**.**

Definition 4**.**

Corollary 1**.**

Proof.

Remark 4**.**

Proposition 1**.**

Remark 5**.**

Proposition 2**.**

Proposition 3**.**

III Lipschitz Continuity w.r.t. θ\thetaθ of the

Theorem 1**.**

Proof.

Assumption 3** (Lipschitz Continuity of PθP_{\theta}Pθ​).**

Lemma 2**.**

Proof.

Lemma 3**.**

Proof.

Assumption 4**.**

Theorem 2**.**

Proof.

Lemma 4**.**

Proof.

Corollary 2**.**

Proof.

Lemma 5**.**

Proof.

Remark 6**.**

IV Relaxations of the Conditions

Assumption 5** (Uniform Drift Condition for PθrP_{\theta}^{r}Pθr​).**

Assumption 6** (Uniform One Step Growth Condition for PθP_{\theta}Pθ​).**

Lemma 6**.**

Proof.

Lemma 7**.**

Assumption 7** **(Uniform Local Minorization for

Theorem 3**.**

Proof.

Theorem 4**.**

Theorem 5**.**

Proof.

Theorem 6**.**

Proof.

Theorem 7**.**

Lemma 8**.**

Proof.

Proof of Theorem 7.

Remark 7**.**

Assumption 8** (Individual Drift Conditions).**

Lemma 9**.**

Proof.

V Controlled Queues

Assumption 9**.**

Assumption 10**.**

Assumption 11**.**

Lemma 10**.**

Proof.

Lemma 11**.**

Remark 8**.**

Lemma 12**.**

Remark 9**.**

Proof of Lemma 12.

Lemma 13**.**

Assumption 1 (Uniform Drift Condition for $P_{\theta}$ ).

Remark 1.

Assumption 2 (Local Minorization).

Remark 2 (Interpretation of $R$ ).

Remark 3 (Constant Shifts of $V$ ).

Definition 1.

Definition 2.

Definition 3.

Lemma 1.

Definition 4.

Corollary 1.

Remark 4.

Proposition 1.

Remark 5.

Proposition 2.

Proposition 3.

III Lipschitz Continuity w.r.t. $\theta$ of the

Theorem 1.

Assumption 3 (Lipschitz Continuity of $P_{\theta}$ ).

Lemma 2.

Lemma 3.

Assumption 4.

Theorem 2.

Lemma 4.

Corollary 2.

Lemma 5.

Remark 6.

Assumption 5 (Uniform Drift Condition for $P_{\theta}^{r}$ ).

Assumption 6 (Uniform One Step Growth Condition for $P_{\theta}$ ).

Lemma 6.

Lemma 7.

Assumption 7 (Uniform Local Minorization for

Theorem 3.

Theorem 4.

Theorem 5.

Theorem 6.

Theorem 7.

Lemma 8.

Remark 7.

Assumption 8 (Individual Drift Conditions).

Lemma 9.

Assumption 9.

Assumption 10.

Assumption 11.

Lemma 10.

Lemma 11.

Remark 8.

Lemma 12.

Remark 9.

Lemma 13.

Corollary 3.

Assumption 12.

Lemma 14.