Variable Demand and Multi-commodity Flow in Markovian Network   Equilibrium

Yue Yu; Dan Calderone; Sarah H. Q. Li; Lillian J. Ratliff and; Beh\c{c}et A\c{c}{\i}kme\c{s}e

arXiv:1901.08731·math.OC·October 19, 2021·Autom.

Variable Demand and Multi-commodity Flow in Markovian Network Equilibrium

Yue Yu, Dan Calderone, Sarah H. Q. Li, Lillian J. Ratliff and, Beh\c{c}et A\c{c}{\i}kme\c{s}e

PDF

Open Access

TL;DR

This paper extends Markovian network equilibrium to include variable demand with quitting options and multi-commodity flows with heterogeneous ending times, providing new algorithms and computational analysis.

Contribution

It introduces two novel extensions to Markovian network equilibrium and develops dynamic programming algorithms with complexity analysis.

Findings

01

Algorithms outperform Mosek in computational efficiency.

02

Extensions handle variable demand and multi-commodity flows.

03

Numerical experiments validate the proposed models.

Abstract

Markovian network equilibrium generalizes the classical Wardrop equilibrium in network games. At a Markovian network equilibrium, each player of the game solves a Markov decision process instead of a shortest path problem. We propose two novel extensions of Markovian network equilibrium by considering 1) variable demand, which offers the players a quitting option, and 2) multi-commodity flow, which allows players to have heterogeneous ending time. We further develop dynamic-programming-based iterative algorithms for the proposed equilibrium problems, together with their arithmetic complexity analysis. Finally, we illustrate our network equilibrium model via a multi-commodity ride-sharing example, and compare the computational efficiency of our algorithms against state-of-the-art optimization software Mosek over extensive numerical experiments.

Figures3

Click any figure to enlarge with its caption.

Tables1

Table 1. Table 1: Values of γ 𝛾 \gamma where 𝒟 = { 9 , 10 , 11 } 𝒟 9 10 11 \mathcal{D}=\{9,10,11\} .

$γ_{t s s^{'}}$	$s \notin 𝒟$ $s^{'} \in 𝒟$	$s \in 𝒟$ $s^{'} \in 𝒟$	$s \in 𝒟$ $s^{'} \notin 𝒟$	$s \notin 𝒟$ $s^{'} \notin 𝒟$
$1 \leq t \leq 6$	600	200	60	60
$7 \leq t \leq 30$	200	400	200	60
$31 \leq t \leq 36$	60	100	600	60

Equations109

\begin{array}[]{ll}\underset{y}{\mbox{min}}&\sum\limits_{t,s,a}c_{tsa}y_{tsa}\\ \mbox{s.t.}&\sum\limits_{a}y_{1sa}=p_{1s},\\ &\sum\limits_{a}y_{t+1,sa}=p_{t+1,s}+\sum\limits_{s^{\prime},a}P_{s^{\prime}as}y_{ts^{\prime}a},\enskip t\in[T-1],\\ &0\leq y_{tsa},\enskip\forall t\in[T],s\in[S],a\in[A].\end{array}

\begin{array}[]{ll}\underset{y}{\mbox{min}}&\sum\limits_{t,s,a}c_{tsa}y_{tsa}\\ \mbox{s.t.}&\sum\limits_{a}y_{1sa}=p_{1s},\\ &\sum\limits_{a}y_{t+1,sa}=p_{t+1,s}+\sum\limits_{s^{\prime},a}P_{s^{\prime}as}y_{ts^{\prime}a},\enskip t\in[T-1],\\ &0\leq y_{tsa},\enskip\forall t\in[T],s\in[S],a\in[A].\end{array}

\begin{array}[]{ll}\underset{v}{\mbox{max}}&\sum\limits_{t,s}p_{ts}v_{ts}\\ \mbox{s.t.}&v_{Ts}\leq c_{Tsa},\\ &v_{ts}\leq c_{tsa}+\sum\limits_{s^{\prime}}P_{sas^{\prime}}v_{t+1,s^{\prime}},\enskip t\in[T-1],\\ &\forall s\in[S],a\in[A].\end{array}

\begin{array}[]{ll}\underset{v}{\mbox{max}}&\sum\limits_{t,s}p_{ts}v_{ts}\\ \mbox{s.t.}&v_{Ts}\leq c_{Tsa},\\ &v_{ts}\leq c_{tsa}+\sum\limits_{s^{\prime}}P_{sas^{\prime}}v_{t+1,s^{\prime}},\enskip t\in[T-1],\\ &\forall s\in[S],a\in[A].\end{array}

(v_{T s}, a) = a^{'} \in [A] min c_{T s a^{'}},

(v_{T s}, a) = a^{'} \in [A] min c_{T s a^{'}},

(v_{t s}, a) = a^{'} \in [A] min c_{t s a^{'}} + s^{'} \sum P_{s a^{'} s^{'}} v_{t + 1, s^{'}},

σ = max {N_{1}, N_{2}} / S \in [1/ S, 1],

σ = max {N_{1}, N_{2}} / S \in [1/ S, 1],

\begin{array}[]{ll}\underset{y,z}{\mbox{min}}&\sum\limits_{t,s,a}{\displaystyle\int_{0}^{y_{tsa}}\phi_{tsa}(\alpha)d\alpha}+\sum\limits_{t,s}\displaystyle\int_{0}^{z_{ts}}\psi_{ts}(\alpha)d\alpha\\ \mbox{s.t.}&\sum\limits_{a}y_{1sa}=p_{1s}-z_{1s},\\ &\sum\limits_{a}y_{t+1,sa}=p_{t+1,s}-z_{t+1,s}+\sum\limits_{s^{\prime},a}P_{s^{\prime}as}y_{ts^{\prime}a},\\ &t\in[T-1],\\ &0\leq y_{tsa},0\leq z_{ts}\leq p_{ts},\,\forall t\in[T],s\in[S],a\in[A].\end{array}

\begin{array}[]{ll}\underset{y,z}{\mbox{min}}&\sum\limits_{t,s,a}{\displaystyle\int_{0}^{y_{tsa}}\phi_{tsa}(\alpha)d\alpha}+\sum\limits_{t,s}\displaystyle\int_{0}^{z_{ts}}\psi_{ts}(\alpha)d\alpha\\ \mbox{s.t.}&\sum\limits_{a}y_{1sa}=p_{1s}-z_{1s},\\ &\sum\limits_{a}y_{t+1,sa}=p_{t+1,s}-z_{t+1,s}+\sum\limits_{s^{\prime},a}P_{s^{\prime}as}y_{ts^{\prime}a},\\ &t\in[T-1],\\ &0\leq y_{tsa},0\leq z_{ts}\leq p_{ts},\,\forall t\in[T],s\in[S],a\in[A].\end{array}

\begin{array}[]{ll}\underset{u,v,w,\lambda}{\mbox{max}}&\sum\limits_{t,s}p_{ts}(v_{ts}-\lambda_{ts})-\sum\limits_{t,s,a}\displaystyle\int_{\phi_{tsa}(0)}^{u_{tsa}}\phi_{tsa}^{-1}(\alpha)d\alpha\\ &-\sum\limits_{t,s}\displaystyle\int_{\psi_{ts}(0)}^{w_{ts}}\psi^{-1}_{ts}(\alpha)d\alpha\\ \mbox{s.t.}&v_{Ts}\leq u_{Tsa},\\ &v_{ts}\leq u_{tsa}+\sum\limits_{s^{\prime}}P_{sas^{\prime}}v_{t+1,s^{\prime}},\enskip t\in[T-1],\\ &v_{ts}\leq w_{ts}+\lambda_{ts},\enskip 0\leq\lambda_{ts},\\ &\forall t\in[T],s\in[S],a\in[A].\end{array}

\begin{array}[]{ll}\underset{u,v,w,\lambda}{\mbox{max}}&\sum\limits_{t,s}p_{ts}(v_{ts}-\lambda_{ts})-\sum\limits_{t,s,a}\displaystyle\int_{\phi_{tsa}(0)}^{u_{tsa}}\phi_{tsa}^{-1}(\alpha)d\alpha\\ &-\sum\limits_{t,s}\displaystyle\int_{\psi_{ts}(0)}^{w_{ts}}\psi^{-1}_{ts}(\alpha)d\alpha\\ \mbox{s.t.}&v_{Ts}\leq u_{Tsa},\\ &v_{ts}\leq u_{tsa}+\sum\limits_{s^{\prime}}P_{sas^{\prime}}v_{t+1,s^{\prime}},\enskip t\in[T-1],\\ &v_{ts}\leq w_{ts}+\lambda_{ts},\enskip 0\leq\lambda_{ts},\\ &\forall t\in[T],s\in[S],a\in[A].\end{array}

if z_{t s} = 0, then v_{t s} \leq ψ_{t s} (p_{t s}),

if z_{t s} = 0, then v_{t s} \leq ψ_{t s} (p_{t s}),

if 0 < z_{t s} < p_{t s}, then v_{t s} = ψ_{t s} (z_{t s}),

if z_{t s} = p_{t s}, then v_{t s} \geq ψ_{t s} (0) .

(v_{T s}, a) = a^{'} \in [A] min ϕ_{T s a^{'}} (y_{T s a^{'}}),

(v_{T s}, a) = a^{'} \in [A] min ϕ_{T s a^{'}} (y_{T s a^{'}}),

(v_{t s}, a) = a^{'} \in [A] min ϕ_{t s a^{'}} (y_{t s a^{'}}) + s^{'} \sum P_{s a^{'} s^{'}} v_{t + 1, s^{'}},

\begin{array}[]{ll}\underset{\{y^{\tau}\}_{\tau\in\mathbb{T}}}{\mbox{min}}&\sum\limits_{t,s,a}\displaystyle\int_{0}^{\sum\limits_{\tau,\tau\geq t}y^{\tau}_{tsa}}\phi_{tsa}(\alpha)d\alpha\\ \mbox{s.t.}&\sum\limits_{a}y^{\tau}_{1sa}=p^{\tau}_{1s}\\ &\sum\limits_{a}y^{\tau}_{t+1,sa}=p^{\tau}_{t+1,s}+\sum\limits_{s^{\prime},a}P_{s^{\prime}as}y^{\tau}_{ts^{\prime}a},\,t\in[\tau-1],\\ &0\leq y^{\tau}_{tsa},\enskip\forall\tau\in\mathbb{T},t\in[\tau],s\in[S],a\in[A]\end{array}

\begin{array}[]{ll}\underset{\{y^{\tau}\}_{\tau\in\mathbb{T}}}{\mbox{min}}&\sum\limits_{t,s,a}\displaystyle\int_{0}^{\sum\limits_{\tau,\tau\geq t}y^{\tau}_{tsa}}\phi_{tsa}(\alpha)d\alpha\\ \mbox{s.t.}&\sum\limits_{a}y^{\tau}_{1sa}=p^{\tau}_{1s}\\ &\sum\limits_{a}y^{\tau}_{t+1,sa}=p^{\tau}_{t+1,s}+\sum\limits_{s^{\prime},a}P_{s^{\prime}as}y^{\tau}_{ts^{\prime}a},\,t\in[\tau-1],\\ &0\leq y^{\tau}_{tsa},\enskip\forall\tau\in\mathbb{T},t\in[\tau],s\in[S],a\in[A]\end{array}

\begin{array}[]{ll}\underset{u,\{v^{\tau}\}_{\tau\in\mathbb{T}}}{\mbox{max}}&\sum\limits_{t,s}\,\,\sum\limits_{\tau,\tau\geq t}p^{\tau}_{ts}v^{\tau}_{ts}-\sum\limits_{t,s,a}\displaystyle\int_{\phi_{tsa}(0)}^{u_{tsa}}\phi_{tsa}^{-1}(\alpha)d\alpha\\ \mbox{s.t.}&v^{\tau}_{\tau s}\leq u_{\tau sa},\\ &v^{\tau}_{ts}\leq u_{tsa}+\sum\limits_{s^{\prime}}P_{sas^{\prime}}v^{\tau}_{t+1,s^{\prime}},\enskip t\in[\tau-1],\\ &\forall\tau\in\mathbb{T},s\in[S],a\in[A]\end{array}

\begin{array}[]{ll}\underset{u,\{v^{\tau}\}_{\tau\in\mathbb{T}}}{\mbox{max}}&\sum\limits_{t,s}\,\,\sum\limits_{\tau,\tau\geq t}p^{\tau}_{ts}v^{\tau}_{ts}-\sum\limits_{t,s,a}\displaystyle\int_{\phi_{tsa}(0)}^{u_{tsa}}\phi_{tsa}^{-1}(\alpha)d\alpha\\ \mbox{s.t.}&v^{\tau}_{\tau s}\leq u_{\tau sa},\\ &v^{\tau}_{ts}\leq u_{tsa}+\sum\limits_{s^{\prime}}P_{sas^{\prime}}v^{\tau}_{t+1,s^{\prime}},\enskip t\in[\tau-1],\\ &\forall\tau\in\mathbb{T},s\in[S],a\in[A]\end{array}

\displaystyle\textstyle(v^{\tau}_{\tau s},a)=\underset{a^{\prime}\in[A]}{\min}\enskip\phi_{\tau sa^{\prime}}\big{(}\sum\limits_{\tau,\tau\geq t}y^{\tau}_{\tau sa^{\prime}}\big{)},

\displaystyle\textstyle(v^{\tau}_{\tau s},a)=\underset{a^{\prime}\in[A]}{\min}\enskip\phi_{\tau sa^{\prime}}\big{(}\sum\limits_{\tau,\tau\geq t}y^{\tau}_{\tau sa^{\prime}}\big{)},

\displaystyle\textstyle(v^{\tau}_{ts},a)=\underset{a^{\prime}\in[A]}{\min}\enskip\phi_{tsa^{\prime}}\big{(}\sum\limits_{\tau,\tau\geq t}y^{\tau}_{tsa^{\prime}}\big{)}+\sum\limits_{s^{\prime}}P_{sa^{\prime}s^{\prime}}v^{\tau}_{t+1,s^{\prime}},

[ϕ (y)]_{t s a} = ϕ_{t s a} (y_{t s a}), [ϕ^{- 1} (u)]_{t s a} = ϕ_{t s a}^{- 1} (u_{t s a}),

[ϕ (y)]_{t s a} = ϕ_{t s a} (y_{t s a}), [ϕ^{- 1} (u)]_{t s a} = ϕ_{t s a}^{- 1} (u_{t s a}),

[ψ (z)]_{t s} = ϕ_{t s} (z_{t s}), [ψ^{- 1} (w)]_{t s} = ψ_{t s}^{- 1} (w_{t s}),

\underline{u}_{t s a} = ϕ_{t s a} (0), \overline{u}_{t s a} = ϕ_{t s a} (ρ),

\underline{u}_{t s a} = ϕ_{t s a} (0), \overline{u}_{t s a} = ϕ_{t s a} (ρ),

\underline{w}_{t s} = ψ_{t s} (0), \overline{w}_{t s} = ψ_{t s} (ρ),

\begin{array}[]{lll}-g(u,w)=&\underset{y,z}{\mbox{min}}&\sum\limits_{t,s,a}u_{tsa}y_{tsa}+\sum\limits_{t,s}w_{ts}z_{ts}\\ &\mbox{s.t.}&\text{constraints in problem \eqref{opt: optimal flow}. }\end{array}

\begin{array}[]{lll}-g(u,w)=&\underset{y,z}{\mbox{min}}&\sum\limits_{t,s,a}u_{tsa}y_{tsa}+\sum\limits_{t,s}w_{ts}z_{ts}\\ &\mbox{s.t.}&\text{constraints in problem \eqref{opt: optimal flow}. }\end{array}

\overset{z}{^}_{t s} = {p_{t s}, 0, \overset{v}{^}_{t s} > w_{t s} \overset{v}{^}_{t s} \leq w_{t s} \forall t \in [T], s \in [S] .

\overset{z}{^}_{t s} = {p_{t s}, 0, \overset{v}{^}_{t s} > w_{t s} \overset{v}{^}_{t s} \leq w_{t s} \forall t \in [T], s \in [S] .

- g (u, w) = t, s, a \sum u_{t s a} \overset{y}{^}_{t s a} + t, s \sum w_{t s} \overset{z}{^}_{t s} .

- g (u, w) = t, s, a \sum u_{t s a} \overset{y}{^}_{t s a} + t, s \sum w_{t s} \overset{z}{^}_{t s} .

g (u^{'}, w^{'}) - g (u, w)

g (u^{'}, w^{'}) - g (u, w)

\geq

\begin{array}[]{lll}-h(u)=&\underset{y^{\tau},\tau\in\mathbb{T}}{\mbox{min}}&\sum\limits_{t,s,a}\,\,\sum\limits_{\tau,\tau\geq t}u_{tsa}y_{tsa}^{\tau}\\ &\mbox{s.t.}&\text{constraints in problem \eqref{opt: multi optimal flow}. }\end{array}

\begin{array}[]{lll}-h(u)=&\underset{y^{\tau},\tau\in\mathbb{T}}{\mbox{min}}&\sum\limits_{t,s,a}\,\,\sum\limits_{\tau,\tau\geq t}u_{tsa}y_{tsa}^{\tau}\\ &\mbox{s.t.}&\text{constraints in problem \eqref{opt: multi optimal flow}. }\end{array}

- h (u) = t, s, a \sum τ, τ \geq t \sum u_{t s a} \overset{y}{^}_{t s a}^{τ} .

- h (u) = t, s, a \sum τ, τ \geq t \sum u_{t s a} \overset{y}{^}_{t s a}^{τ} .

h (u^{'}) - h (u) \geq t, s, a \sum τ, τ \geq t \sum (u_{t s a}^{'} - u_{t s a}) (- \overset{y}{^}_{t s a}^{τ}) .

h (u^{'}) - h (u) \geq t, s, a \sum τ, τ \geq t \sum (u_{t s a}^{'} - u_{t s a}) (- \overset{y}{^}_{t s a}^{τ}) .

∣ ϕ_{t s a} (α_{1}) - ϕ_{t s a} (α_{2}) ∣ \leq ∣ ϕ_{t s a}^{'} (α_{3}) ∣ \cdot ∣ α_{1} - α_{2} ∣.

∣ ϕ_{t s a} (α_{1}) - ϕ_{t s a} (α_{2}) ∣ \leq ∣ ϕ_{t s a}^{'} (α_{3}) ∣ \cdot ∣ α_{1} - α_{2} ∣.

L \geq α \in [0, ρ] max ∣ ϕ_{t s a}^{'} (α) ∣, \forall t \in [T], s \in [S], a \in [A] .

L \geq α \in [0, ρ] max ∣ ϕ_{t s a}^{'} (α) ∣, \forall t \in [T], s \in [S], a \in [A] .

\begin{array}[]{ll}\underset{u,w}{\mbox{max}}&-g(u,w)-\sum\limits_{t,s,a}\displaystyle\int_{\phi_{tsa}(0)}^{u_{tsa}}\phi_{tsa}^{-1}(\alpha)d\alpha\\ &-\sum\limits_{t,s}\displaystyle\int_{\psi_{ts}(0)}^{w_{ts}}\psi^{-1}_{ts}(\alpha)d\alpha\\ \end{array}

\begin{array}[]{ll}\underset{u,w}{\mbox{max}}&-g(u,w)-\sum\limits_{t,s,a}\displaystyle\int_{\phi_{tsa}(0)}^{u_{tsa}}\phi_{tsa}^{-1}(\alpha)d\alpha\\ &-\sum\limits_{t,s}\displaystyle\int_{\psi_{ts}(0)}^{w_{ts}}\psi^{-1}_{ts}(\alpha)d\alpha\\ \end{array}

\begin{array}[]{lll}-g(u,w)=&\underset{v,\lambda}{\mbox{max}}&\sum\limits_{t,s}p_{ts}(v_{ts}-\lambda_{ts})\\ &\quad\mbox{s.t.}&\text{constraints in \eqref{opt: optimal potential}.}\end{array}

\begin{array}[]{lll}-g(u,w)=&\underset{v,\lambda}{\mbox{max}}&\sum\limits_{t,s}p_{ts}(v_{ts}-\lambda_{ts})\\ &\quad\mbox{s.t.}&\text{constraints in \eqref{opt: optimal potential}.}\end{array}

\begin{array}[]{ll}\underset{u,w}{\mbox{max}}&-g(u,w)-\sum\limits_{t,s,a}\displaystyle\int_{\phi_{tsa}(0)}^{u_{tsa}}\phi_{tsa}^{-1}(\alpha)d\alpha\\ &-\sum\limits_{t,s}\displaystyle\int_{\psi_{ts}(0)}^{w_{ts}}\psi^{-1}_{ts}(\alpha)d\alpha\\ \mbox{s.t.}&\text{$-g(u,w)$ is the optimal value of \eqref{opt: linear optimal flow}. }\end{array}

\begin{array}[]{ll}\underset{u,w}{\mbox{max}}&-g(u,w)-\sum\limits_{t,s,a}\displaystyle\int_{\phi_{tsa}(0)}^{u_{tsa}}\phi_{tsa}^{-1}(\alpha)d\alpha\\ &-\sum\limits_{t,s}\displaystyle\int_{\psi_{ts}(0)}^{w_{ts}}\psi^{-1}_{ts}(\alpha)d\alpha\\ \mbox{s.t.}&\text{$-g(u,w)$ is the optimal value of \eqref{opt: linear optimal flow}. }\end{array}

\begin{array}[]{ll}\underset{u}{\mbox{max}}&-h(u)-\sum\limits_{t,s,a}\displaystyle\int_{\phi_{tsa}(0)}^{u_{tsa}}\phi_{tsa}^{-1}(\alpha)d\alpha\\ \mbox{s.t.}&\text{ $-h(u)$ is the optimal value of \eqref{opt: linear multi optimal flow}. }\end{array}

\begin{array}[]{ll}\underset{u}{\mbox{max}}&-h(u)-\sum\limits_{t,s,a}\displaystyle\int_{\phi_{tsa}(0)}^{u_{tsa}}\phi_{tsa}^{-1}(\alpha)d\alpha\\ \mbox{s.t.}&\text{ $-h(u)$ is the optimal value of \eqref{opt: linear multi optimal flow}. }\end{array}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTransportation and Mobility Innovations · Transportation Planning and Optimization · Urban Transport and Accessibility

Full text

Variable Demand and Multi-commodity Flow in Markovian Network Equilibrium

Yue Yu

Dan Calderone

Sarah H. Q. Li

Lillian J. Ratliff

Behçet Açıkmeşe

Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin, Austin, TX, 78712

Department of Aeronautics and Astronautics, University of Washington, Seattle, WA, 98195

Department of Electrical and Computer Engineering, University of Washington, Seattle, WA, 98195

Abstract

Markovian network equilibrium generalizes the classical Wardrop equilibrium in network games. At a Markovian network equilibrium, each player of the game solves a Markov decision process instead of a shortest path problem. We propose two novel extensions of Markovian network equilibrium by considering 1) variable demand, which offers the players a quitting option, and 2) multi-commodity flow, which allows players to have heterogeneous ending time. We further develop dynamic-programming-based iterative algorithms for the proposed equilibrium problems, together with their arithmetic complexity analysis. Finally, we illustrate our network equilibrium model via a multi-commodity ride-sharing example, and compare the computational efficiency of our algorithms against state-of-the-art optimization software Mosek over extensive numerical experiments.

keywords:

Wardrop equilibrium, Markov decision process, network optimization

, , , ,

††thanks: Y. Yu is with the Oden Institute for Computational Engineering and Sciences, The University of Texas at Austin, Austin, Texas, 78712 (e-mail: [email protected]). S. H. Q. Li, and B. Açıkmeşe are with the William E. Boeing Department of Aeronautics & Astronautics, University of Washington, Seattle, Washington, 98195 (e-mail: [email protected], [email protected], [email protected]). D. Calderone and L. J. Ratliff are with the Department of Electrical & Computer Engineering, University of Washington, Seattle, Washington, 98195 (e-mail:[email protected], [email protected]).

1 Introduction

Network equilibrium problems arise in a variety of applications, such as resource allocation and routing in communication or transportation networks [Rockafellar, 1984, Bertsekas, 1998, Xiao et al., 2004, Bürger et al., 2014]. Among the most well-studied examples is the Wardrop equilibrium model in routing games [Beckmann et al., 1956, Gartner, 1980a, Gartner, 1980b, Correa and Stier-Moses, 2010, Patriksson, 1994]. In this model, users in a transportation network are assumed to choose routes with cost that they perceive as the lowest, i.e., each user solves a shortest path problem, under the prevailing traffic conditions [Correa and Stier-Moses, 2010]. With this assumption, the resulting equilibra are characterized by the Wardrop equilibrium principle: the cost of all the routes actually used are equal, and less than those which would be experienced by a single user on any unused route [Wardrop and Whitehead, 1952].

To ensure their practical relevance, it is often necessary to incorporating stochasticity into the network equilibrium problems. For example, the stochastic user equilibrium (SUE) model [Fisk, 1980, Sheffi and Powell, 1982, Liu et al., 2009] considers independent stochastic error on the route cost perceived by the users, leading to user distribution based on the logit [Dial, 1971] or probit model [Daganzo and Sheffi, 1977]; see [Patriksson, 1994, Sec. 2.8.1] and [Cominetti et al., 2012] for a detailed discussion. Unfortunately, the SUE model presents several drawbacks: it requires computationally expensive route enumeration, and is not suited for problems with overlapping routes due to its assumption of independent route cost.

To address these drawbacks, different network models consider different type of stochasticity. In particular, [Baillon and Cominetti, 2008, Ahipaşaoğlu et al., 2019] introduced a Markovian network equilibrium model where users are assumed to choose, instead of routes, sequences of actions with accumulated cost that they perceive as the lowest. Each action is accompanied by a deterministic outcome and a stochastic cost. For example, each vehicle in a transportation network is assumed to choose a sequence of arcs, where each arc leads to deterministic transition to the next node in the network and a stochastic amount of travel time [Baillon and Cominetti, 2008].

On the other hand, [Calderone and Sastry, 2017b, Calderone and Sastry, 2017a] proposed a different stochastic network equilibrium model. Unlike the one in [Baillon and Cominetti, 2008], each action is accompanied by a stochastic outcome and a deterministic cost. For example, an aircraft flying in stormy weather is assumed to choose a sequence of waypoints to fly towards, where each choice costs a deterministic amount of fuel usage and is accompanied by a stochastic change in the weather condition [Nilim and El Ghaoui, 2005]. As a result, instead of a shortest path problem, each user solves a Markov decision process (MDP) [Puterman, 1994, Bertsekas, 1996], where the cost of different actions is determined by the prevailing choices of all users. This model has found a variety of applications in modern transportation including ridesharing and parking [Calderone, 2017].

Although the results in [Calderone and Sastry, 2017b, Calderone and Sastry, 2017a] serves as a first step toward a more general class of stochastic dynamic network equilibrium model, it has the following limitations: a) it does not incorporate many important features of Wardrop equilibrium, such as variable demand and multi-commodity flow and b) its solution method relies exclusively on off-the-shelf optimization software, which does not fully exploit the problem structure. We address these limitations by making the following contributions.

We develop novel extensions to the Markovian network equilibrium model by considering a) variable demand, which offers the users a quitting option, and b) multi-commodity flow, which allows users having heterogeneous ending time. 2. 2.

We design novel dynamic-programming-based algorithms for Markovian network equilibrium problems with detailed arithematical complexity analysis. Our algorithms outperform state-of-the-art optimization software Mosek in extensive numerical experiments.

The rest of the paper is organized as follows. We first revisit some background on MDP in Section 2, then present our variable demand and multi-commodity flow equilibrium models in Section 3. Section 4 focuses on developing efficient iterative algorithms for our equilibrium problems. Section 5 first illustrates the equilibrium models in Section 3 via a multi-commodity ride-sharing example, then compares the algorithms in Section 4 against commercial software Mosek. Finally, we conclude with discussions and comments on the future directions of research in Section 6.

Throughout the paper we will use the following notation: $\mathbb{R}$ denotes the set of real numbers, $\mathbb{R}_{+}$ denotes the set of nonnegative real numbers, and $\mathbb{N}$ denotes the set of positive integers; $[N]$ denotes the set $\{1,2,\ldots,N\}$ for integer $N$ ; $a_{ijk}$ denotes the $ijk$ –th component of the three-dimensional tensor $a\in\mathbb{R}^{n_{1}\times n_{2}\times n_{3}}$ , and analogously, $a_{ij}$ for the two-dimensional case. Given $b_{1},\ldots,b_{N}\in\mathbb{R}$ , we say $(b^{\star},i^{\star})=\underset{i\in[N]}{\min}\,\,b_{i},$ if $b^{\star}=\underset{i\in[N]}{\min}\,\,b_{i}$ and $i^{\star}\in\underset{i\in[N]}{\mathop{\rm argmin}}\,\,b_{i}$ .

2 Preliminaries and background

A $T$ -horizon MDP is defined by a set of states $[S]$ , a set of actions $[A]$ , a cost tensor $c\in\mathbb{R}^{T\times S\times A}$ , and a transition probablity tensor $P\in[0,1]^{S\times A\times S}$ , where $T,S,A\in\mathbb{N}$ denote the number of time steps, states, and actions, respectively. Further, $c_{tsa}\in\mathbb{R}$ denotes the cost of choosing action $a\in[A]$ in state $s\in[S]$ at time $t\in[T]$ , and $P_{sas^{\prime}}\in[0,1]$ denotes the probability of transition from state $s\in[S]$ to $s^{\prime}\in[S]$ when choosing action $a\in[A]$ . In order to find the optimal sequence of action that minimizes the expected accumulated cost, one can solve either one of the two following linear programs111Compared with the formulation in [Puterman, 1994], the linear program here also allows $p_{ts}>0$ when $t>1$ .:

[TABLE]

where $p\in\mathbb{R}_{+}^{T\times S}$ is such that $p_{1s}>0$ for some $s\in[S]$ . If $\sum_{s\in[S]}p_{1s}=1$ and $p_{ts}=0$ for all $1\leq t\leq T$ and $s\in[S]$ , then $p_{1s}$ represents the probability of starting the MDP in state $s$ . Variable $y_{tsa}$ in optimization (1) represents the probability of choosing action $a$ in state $s$ at time $t$ , and variable $v_{ts}$ in optimization (2) represents the expected accumulated cost between time $t$ and time $T$ starting from state $s$ . In general, optimization (1) and (2) can be interpreted as the linear optimal distribution [Rockafellar, 1984, Sec.7A] and differential problem [Rockafellar, 1984, Sec.7E] defined on a $T$ -layered Markovian network (see Fig. 1), where $p_{ts}$ represents the divergence on state node $s$ in the $t$ -th layer, $y_{tsa}$ represents the flow from state $s$ to action $a$ in the $t$ -th layer, and $v_{ts}$ represents the potential on state $s$ in the $t$ -th layer.

The following lemma shows that solutions of optimizations (1) and (2) satisfy the dynamic programming principle.

Lemma 1 ([Puterman, 1994]).

Suppose $y$ solves (1), and $v$ solves (2). If $y_{tsa}>0$ for any $t\in[T],s\in[S],a\in[A]$ , then

[TABLE]

for all $t\in[T-1]$ .

Perhaps the most efficient solution algorithm for problem (1) and (2) is dynamic programming, given by the following Algorithm 1 and Algorithm 2.

Let $(v,\pi)$ be the output of Algorithm 1 with input $(P,c,T)$ , and $y$ be the output of Algorithm 2 with input $(\pi,p,P,T)$ , then one can easily show that such solution pair $(y,v)$ directly satisfies the Karush-Kuhn-Tucker (KKT) conditions [Rockafellar, 1970, Thm. 28.3] of (1) and (2), hence it is an optimal primal-dual solution pair. If we define the sparsity level of an MDP as follows

[TABLE]

where $N_{1}=\max_{s,a}\big{|}\{s^{\prime}|P_{sas^{\prime}}>0\}\big{|}$ and $N_{2}=\max_{s^{\prime},a}\big{|}\{s|P_{sas^{\prime}}>0\}\big{|}$ , then $\sigma S$ measures the maximum number of states connected by the transition kernel $P$ . Further, it is straightforward to check that Algorithm 1 costs $O(\sigma TS^{2}A)$ arithmetic operations, and Algorithm 2 costs $O(\sigma TS^{2})$ arithmetic operations. In addition, Algorithm 1 and Algorithm 2 can be implemented as convolutional neural networks that allows efficient parallel computation [Tamar et al., 2016].

3 Markovian network equilibrium

By combining MDP together with classical routing games, Calderone and Sastry [Calderone and Sastry, 2017b] proposed MDP routing games where a fixed amount of players with the same planning horizon choose sequences of actions that they perceive as achieving the lowest expected accumulated cost under the prevailing choices of other players. Such games are similar to routing games where a fixed amount of players with the same destination choose routes that they perceive as the shortest under the prevailing choices of other players.

In this section, we introduce two generalizations to MDP routing games that allow the amount of players to vary and the planning horizon to differ. We will also show that, under mild assumptions, the equilibra of such games can be computed efficiently using convex optimization.

3.1 Variable demand

One limitation of the MDP routing games in [Calderone and Sastry, 2017b] is the assumption that total amount of players is fixed. However, an important feature in network games is to allow the total amount of players to vary, or equivalently, to provide the players with a quitting action [Patriksson, 1994, Sec. 2.1.2]. Aiming to address this limitation, we propose the following variable demand MDP routing games.

Game 1.

At each time $t\in[T]$ , $p_{ts}$ new players start the game from state $s\in[S]$ . Among these $p_{ts}$ players, each one can choose to

quit the game immediately at the cost of $\psi_{ts}(z_{ts})$ , 2. 2.

take action $a\in[A]$ at the cost of $\phi_{tsa}(y_{tsa})$ and reach state $s^{\prime}\in[S]$ with probability $P_{sas^{\prime}}$ at time $t+1$ , then repeat such process till $t=T$ , when the player ends the game after choosing the last action,

where $z_{ts}$ and $y_{tsa}$ denote the total amount of players choosing to quit the game in state $s$ at time $t$ , and, respectively, taking action $a$ in state $s$ at time $t$ .

Remark 1.

Game 1 is a special case of mean field games over graphs [Gomes et al., 2009, Gomes et al., 2010, Guéant, 2011, Guéant, 2015, Tanaka et al., 2020]. The interactions among different players is mediated by a mean field, described by function $\phi_{tsa}$ and function $\psi_{ts}$ for all $t\in[T],s\in[S],a\in[A]$ .

Intuitively, one can interpret Game 1 as a competitive market model. The supply side corresponds to the stochastic environment, providing the option of playing or quitting the game. The demand side corresponds to the amount of players that decided to play the game, which changes with the expected accumulated cost of the playing option according to curve $\psi_{ts}$ for all $t\in[T]$ and $s\in[S]$ .

Remark 2.

If the quitting option is not available, then Game 1 reduces to an MDP routing game with fixed demand, introduced in [Calderone and Sastry, 2017b]. On the other hand, if the transition is Game 1 is deterministic, i.e., for each $s\in[S]$ and $a\in[A]$ , there exists $s^{\prime}\in[S]$ such that $P_{sas^{\prime}}=1$ , then Game 1 reduces to a classical single-commodity routing game, with and a variable demand [Patriksson, 1994, Sec. 2.2.3]. Particularly, each player solves an MDP with deterministic transition, which is equivalent to a shortest path problem.

The Wardrop equilibrium principle is a key characterization of the equilibra of network games [Patriksson, 1994, Correa and Stier-Moses, 2010]. The principle states that, at equilibra, only the strategies with the lowest cost are actually used. Does this principle apply to Game 1? As we show in the following, the answer is affirmative.

First, we make the following assumptions on Game 1.

Assumption 1.

We assume that $p\in\mathbb{R}_{+}^{T\times S}$ , $P\in[0,1]^{S\times A\times S}$ and $\sum_{s^{\prime}}P_{sas^{\prime}}=1$ for all $s\in[S]$ , $a\in[A]$ . Further, the function $\phi_{tsa}:[0,\rho]\to\mathbb{R}$ and function $\psi_{ts}:[0,\rho]\to\mathbb{R}$ are continuous and strictly increasing over their respective domains, where $\rho=\sum_{t,s}p_{ts}$ .

With these assumptions, we now introduce the following pair of primal-dual optimization problems associated with Game 1.

[TABLE]

In particular, the constraint $0\leq z_{ts}\leq p_{ts}$ allows the number of players choosing to quit the game in state $s$ at time $t$ to vary winthin interval $[0,p_{ts}]$ . If variable $z_{ts}$ is zero and function $\phi_{tsa}$ is constant-valued for all $t\in[T],s\in[S],a\in[A]$ , i.e., the quitting option is removed and the cost of each action does not depend on $y$ in in Game 1, then one can verify that optimization (4) will reduce to (1) and optimization (5) will reduce to (2).

The following theorem shows that, under Assumption 1, the solution to the optimizations in (4) and (5) satisfy an equilibrium condition of Game 1. Similar to the Wardrop equilibrium principle, this equilibrium condition impliesthat no individual player can benefit from unilaterally switching its actions.

Theorem 1.

Suppose Assumption 1 holds, $(y,z)$ solves (4), and $(u,v,w,\lambda)$ solves (5), then for any $p_{ts}>0$ ,

[TABLE]

Further, if $y_{tsa}>0$ , then

[TABLE]

for all $t\in[T-1]$ .

{pf*}

Proof See Appendix A.1.

Theorem 1 shows that an equilibrium of Game 1 that satisfies the Wardrop equilibrium principle not only exists, but can be computed by solving optimization (4) and (5). In particular, if action $a$ is chosen in state $s$ at time $t$ by any player at equilibrium, i.e., $y_{tsa}>0$ , then action $a$ must be optimal in the sense of Algorithm 1. On the other hand, equation (6) says that if some players choose the quitting option in state $s$ at time $t$ at equilibrium, i.e., $z_{ts}>0$ , then the cost of playing is no more than quitting, i.e., $v_{ts}\geq\psi_{ts}(p_{ts})$ . Similarly, if some players choose to play, i.e., $z_{ts}<p_{ts}$ , then the cost of playing is no more than quitting, i.e., $v_{ts}\leq\psi_{ts}(p_{ts})$ . Therefore, Theorem 1 indeed describes a Wardrop equilibrium where no individual player can benefit from unilaterally switching to alternative actions.

3.2 Multicommodity flow

Another limitation of the MDP routing games in [Calderone and Sastry, 2017b] is that all players are assumed to end their game at the same time, which is analogous to the single commodity routing game where all vehicles have the same destination. Aiming to address this limitation, we propose the following multi-commodity MDP routing game, where players can have heterogeneous ending time, denote by $\mathbb{T}$ . We assume, without loss of generality, that $\mathbb{T}\subset[T]$ and $T\in\mathbb{T}$ .

Game 2.

At each time $t\in[T]$ , $p_{ts}^{\tau}$ new players who have a common ending time $\tau\in\mathbb{T}$ with $\tau\geq t$ , start the game from state $s$ . Each of these players can choose the action $a$ at the cost of $\phi_{tsa}(\sum_{\tau,\tau\geq t}y_{tsa}^{\tau})$ and reach state $s^{\prime}$ with probability $P_{sas^{\prime}}$ at time $t+1$ , then repeat this process till $t=\tau$ , when the player ends the game after choosing the last action. Here $y_{tsa}^{\tau}$ denotes the total amount of players who plan to end the game at time $\tau$ and choose action $a$ in state $s$ at time $t$ .

Remark 3.

If $\mathbb{T}=\{T\}$ , then Game 2 reduces to a MDP routing game introduced in [Calderone and Sastry, 2017b]. On the other hand, if the transition in Game 2 is deterministic, i.e., for each $s\in[S]$ and $a\in[A]$ , there exists $s^{\prime}\in[S]$ such that $P_{sas^{\prime}}=1$ , then Game 2 reduces to the traditional multi-commodity routing game with a fixed demand [Patriksson, 1994, Sec. 2.1.1]. Particularly, the state where a player start and end the game form a origin-destination pair, which is jointly determined by the starting state and the deterministic transition.

Similar to Game 1, the equilibrium of Game 2 can also be computed by solving convex optimization problems, as we show in the following.

First, we make the following assumptions on Game 2.

Assumption 2.

We assume $T\in\mathbb{T}\subseteq[T]$ , $p_{ts}^{\tau}\in\mathbb{R}_{+}$ for all $\tau\in\mathbb{T}$ , $t\geq\tau$ and $s\in[S]$ ; $P\in\mathbb{R}_{+}^{S\times A\times S}$ and $\sum_{s^{\prime}}P_{sas^{\prime}}=1$ for all $s\in[S]$ , $a\in[A]$ . Further, the function $\phi_{tsa}:[0,\rho]\to\mathbb{R}$ is continuous and strictly increasing, where $\rho=\sum_{\tau}\sum_{t\leq\tau,s}p_{ts}^{\tau}$ .

With these assumptions, we now introduce the following pair of primal-dual optimization problems associated with Game 2. Notice that if $\mathbb{T}=\{T\}$ , then they reduce to optimization (1) and (2), respectively.

[TABLE]

The following theorem shows that, under Assumption 2, the solutions to optimization problems (8) and (9) satisfy the equilibrium condition of Game 2. Similar to the Wardrop equilibrium principle, this equilibrium condition implies that no individual player can benefit from unilaterally switching actions.

Theorem 2.

Suppose Assumption 2 holds, $y$ solves (8), and $(u,v)$ solves (9). If $y^{\tau}_{tsa}>0$ for any $\tau\in\mathbb{T},s\in[S],a\in[A]$ , then

[TABLE]

for all $t\in[\tau-1]$ .

{pf*}

Proof See Appendix A.2.

Theorem 2 shows that a Wardrop equilibrium of Game 2 not only exists, but can be found by solving optimization problems (8) and (9). In particular, the equations in (10) characterize a multi-commodity flow Wardrop equilibrium in the sense that no individual player can benefit from using alternative actions before his/her ending time $\tau$ for all $\tau\in\mathbb{T}$ .

4 Efficient algorithms via linearization

In this section, we develop efficient iterative algorithms for the network equilibrium problems introduced in the previous section. In particular, we first prove that the linearized versions of problem (4) and problem (8) can both be solved in closed form via Algorithm 1 and Algorithm 2. This observation motivates efficient iterative algorithms that enjoy detailed arithematical complexity analysis.

We will use the following notation to simply our later discussions. Given $y,u\in\mathbb{R}^{T\times S\times A}$ and $z,w\in\mathbb{R}^{T\times S}$ , we let $\phi(y),\phi^{-1}(u)\in\mathbb{R}^{T\times S\times A}$ and $\psi(z),\psi^{-1}(w)\in\mathbb{R}^{T\times S}$ be such that

[TABLE]

for all $t\in[T],s\in[S],a\in[A]$ . We also let $\underline{u},\overline{u}\in\mathbb{R}^{T\times S\times A}$ and $\underline{w},\overline{w}\in\mathbb{R}^{T\times S}$ be such that

[TABLE]

for all $t\in[T],s\in[S],a\in[A]$ .

4.1 Linearization and dynamic programming

If we approximate the objective function in (4) using its linearization at $u\in\mathbb{R}^{T\times S\times A}$ and $w\in\mathbb{R}^{T\times S}$ , we obtain the following

[TABLE]

Observe that the above optimization is a modification to problem (1), by including an additional variable $z$ . This suggest that (13) may also be solved using dynamic programming, which is confirmed by the following lemma.

Lemma 2.

Suppose Assumption 1 holds. Let $(\hat{v},\hat{\pi})$ be the output of Algorithm 1 with input $(P,u,T)$ , and

[TABLE]

In addition, let $\hat{y}$ be the output of Algorithm 2 with input $(\hat{\pi},p-\hat{z},P,T)$ . Then

[TABLE]

Further, for any $u^{\prime}\in\mathbb{R}^{T\times S\times A}$ and $w^{\prime}\in\mathbb{R}^{T\times S}$ ,

[TABLE]

{pf*}

Proof See Appendix A.3.

Similarly, if we approximate the objective function in (8) using a linear function, we obtain the following

[TABLE]

where $u\in\mathbb{R}^{T\times S\times A}$ is the approximation parameter, $-h(u)$ is the optimal value of (14). The following lemma shows that optimization (14) can be solved using Algorithm 1 and 2 as well.

Lemma 3.

Suppose Assumption 2 holds. Let $(\hat{v}^{\tau},\hat{\pi})$ be the output of Algorithm 1 with input $(P,u,\tau)$ , $\hat{y}^{\tau}$ be the output of Algorithm 2 with input $(\hat{\pi}^{\tau},p^{\tau},P,\tau)$ . Then

[TABLE]

Further, for any $u^{\prime}\in\mathbb{R}^{T\times S\times A}$ ,

[TABLE]

{pf*}

Proof See Appendix A.4.

Remark 4.

Function $g:\mathbb{R}^{T\times S\times A}\times\mathbb{R}^{T\times S}\to\mathbb{R}$ in Lemma 2 is the support function of a polyhedron, which is closed and convex [Rockafellar, 1970, p.28]. Further, Lemma 2 shows that the slope of a linear underestimator, or subgradient, of function $g(u,w)$ can be computed using Algorithm 1 and Algorithm 2. Similar observation is made in Lemma 3 for function $h:\mathbb{R}^{T\times S\times A}\to\mathbb{R}$ .

4.2 Iterative algorithms using linearization

We now develop iterative algorithms for optimization problems in Section 3 using the results from the previous subsection. We will use the following notion of $\epsilon$ -optimal solution.

Definition 1.

Given a constrained optimization where an objective function is optimized subject to constraints, we say a solution is $\epsilon$ -optimal $\epsilon\in\mathbb{R}_{+}$ if it satisfies all the constraints and the objective function value evaluated at this solution is at most $\epsilon$ away from the optimal value.

We will also use the following additional assumptions on Game 1 and, respectively, Game 2.

Assumption 3.

Function $\phi_{tsa}:[0,\rho]\to\mathbb{R}$ and $\psi_{ts}:[0,\rho]\to\mathbb{R}$ are $L$ -Lipschitz continuous over their respective domains for all $t\in[T]$ , $s\in[S]$ and $a\in[A]$ .

Assumption 4.

Function $\phi_{tsa}:[0,\rho]\to\mathbb{R}$ is $L$ -Lipschitz continuous over its domain for all $t\in[T]$ , $s\in[S]$ and $a\in[A]$ .

Remark 5.

Assumption 3 and Assumption 4 are mild assumptions on the differentiability of the corresponding functions. For example, if $\phi_{tsa}$ is continuously differentiable, then the mean value theorem states that for any $\alpha_{1},\alpha_{2}\in[0,\rho]$ , there exists $\alpha_{3}\in[0,\rho]$ such that

[TABLE]

where $\phi^{\prime}_{tsa}$ is the derivative of function $\phi_{tsa}$ . Hence Assumption 4 is satisfied by choosing

[TABLE]

which takes a bounded value since $\phi^{\prime}_{tsa}$ is continuous. However, the continuity of $\phi^{\prime}_{tsa}$ is not necessary. For example, if $\phi_{tsa}$ is a piecewise linear function, i.e., a function that is affine over a collection of intervals, then it is still Lipschitz continuous even if its derivative is not continuous.

Based on Lemma 2 and Lemma 3, we propose to solve optimiztaion (4) and (8) using Frank-Wolfe method [Frank and Wolfe, 1956], which repeatedly solve the linearized versions of (4) and (8). We summarize the Frank-Wolf method for optimization (4) and (8) in Algorithm 3 and, respectively, Algorithm 4. The following theorem provides the overall arithmetic complexity analysis of Algorithm 3 and Algorithm 4.

The following theorem shows the convergence property of Algorithm 3 and Algorithm 4.

Theorem 3.

Let $\sigma$ be given by (3). If Assumption 1 and 3 hold, then Algorithm 3 with $\alpha^{k}=\frac{2}{k+1}$ gives an $\epsilon$ -optimal solution to (4) in $O(\frac{\sigma TS^{2}A}{\epsilon})$ arithmetic operations. Similarly, if Assumption 2 and 4 hold, then Algorithm 4 with $\alpha^{k}=\frac{2}{k+1}$ gives an $\epsilon$ -optimal solution to (8) in $O(\frac{\sigma T^{2}S^{2}A}{\epsilon})$ arithmetic operations.

{pf*}

Proof See Appendix A.5.

Theorem 1 provides arithmetical complexity of Algorithm 3 and 4, which not only depends on the problem size (i.e., $T,S,A$ ), but also the sparsity of the constraints (i.e., $\sigma$ ) in (4) and (8).

What about the dual problems? Observe that the optimization in (5) can be separated into two layers: an outer layer that optimizes over $(u,w)$ , and an inner layer that optimizes over $(v,\lambda)$ for a given value of $(u,w)$ ; namely, (5) is equivalent to the following

[TABLE]

where

[TABLE]

One can show that optimization (16) is exactly the dual problem of (13). Since the constraint sets in (13) and (16) are both nonempty, the optimal value of (13) and (16) are the same [Von Neumann and Morgenstern, 1953]. In other words, (5) can be equivalently written as follows

[TABLE]

Using similar reasoning, we can rewrite (9) as follows

[TABLE]

We already discussed in Remark 4 how the subgradients of function $g(u,w)$ and $h(u)$ can be computed efficiently using Algorithm 1 and Algorithm 2. In addition, all the other terms in the objective functions of problem (17) and (18) are continuously differentiable. This suggest that (17) and (18) are suited for the projected subgradient method, which optimizes a non-smooth function by repeatedly computing its subgradients and projections onto its domain. We summarize the projected subgradient method applied to (17) and (18) in Algorithm 5 and, respectively, Algorithm 6.

The following theorem shows the the convergence property of Algorithm 5 and Algorithm 6.

Theorem 4.

Let $\sigma$ be given by (3). If Assumption 1 and Assumption 3 hold, then Algorithm 6 with $\alpha^{k}=\frac{2L}{k+1}$ gives an $\epsilon$ -optimal solution to (17) using $O(\frac{\sigma TS^{2}A}{\epsilon})$ arithmetic operations. Similarly, if Assumption 2 and Assumption 4 hold, then Algorithm 6 with $\alpha^{k}\equiv\frac{2LT}{k+1}$ gives an $\epsilon$ -optimal solution to (18) in $O(\frac{\sigma T^{2}S^{2}A}{\epsilon})$ arithmetic operations.

{pf*}

Proof See Appendix A.6

5 Numerical examples

In this section, we first illustrate the equilibrium models in Section 3 using a ride-sharing example, then demonstrate the efficiency of algorithms in Section 4 by comparing them against commercial software Mosek (https://www.mosek.com) over extensive numerical experiments.

5.1 Multicommodity ride-sharing game

We consider the game played by ride-sharing drivers in Seattle, competing for customers. We first abstract the Seattle area as an undirected graph illustrated in Fig. 2, whose nodes denote various neighborhoods in Seattle, and edges denote available routes, labeled by its driving distance. We denote the set of neighboring nodes of node $s$ as $\mathcal{N}_{s}$ . We model the decision-making of an ride-sharing driver on a typical weekend night (7pm-1am) as an MDP defined as follows.

•

Time steps: $t=1,2,\ldots,36$ denotes the (end of) 10-minute-intervals between 7pm and 1am.

•

States: $[S]$ correspond to different nodes in graph $\mathcal{G}$ .

•

Actions: in state $s$ , $a_{s^{\prime}}$ denotes picking up a waiting rider with destination $s^{\prime}$ for all $s^{\prime}\in\mathcal{N}_{s}$ ; $a_{\text{wait}}$ denotes waiting for a future rider.

•

Transition kernel: we assume $P_{sas^{\prime}}$ is given by

[TABLE]

All other entries of $P_{sas^{\prime}}$ are zero. Here we use an uniform distribution over neighboring states to describe the uncertain destinations of future riders222Such distribution can be approximated more accurately using historical data in practical applications..

•

Cost: due to the competition among drivers, we assume the profit for picking up a rider decreases with the amount of drivers making the same offer, namely

[TABLE]

for all $t\in[T],s\in[S]$ , where $\alpha$ and $\beta$ is the baseline profit and, respectively, nominal profit per mile. We let $\text{dist}_{ss^{\prime}}$ denotes distance(miles) between $s$ and $s^{\prime}$ , $\gamma_{tss^{\prime}}$ denotes the rider demand from $s$ to $s^{\prime}$ at time $t$ , and finally $y_{tsa_{s^{\prime}}}$ denotes the amount of drivers choosing action $a_{s^{\prime}}$ in state $s$ at time $t$ . The cost of action $a$ in state $s$ is a function of $y_{tsa}$ defined as follows

[TABLE]

•

Planning time windows: We assume that 10 drivers start working from each state every 10 minutes between 7pm and 9pm. Once started, each driver is assumed to only work for 4 consecutive hours to avoid driver fatigue, i.e., $p^{(t+24)}_{ts}=10$ for all $s\in[S]$ and $t\in[12]$ .

Notice that the function $\phi_{tsa}$ defined above is linear with slope $\alpha$ , hence Assumption 2 is satisfied with $L=\alpha$ . In general, as long as $\phi_{tsa}$ is modeled or approximated as a continuously differentiable function, Assumption 4 is always satisfied, as we discussed in Remark 5. We also assume that each driver can travel between neighboring nodes within one time step in this simplified transportation network. In practice, such assumption can be ensured by adding more nodes to the network using a finer discretization of the interested area.

Notice that since drivers can start the game at different times during $1\leq t\leq 12$ and they will only plan for the next 24 time steps. In other words, drivers with heterogeneous planning time windows will coexist in the network for $t=2,3,\ldots,24$ . Therefore the equilibrium of this game is a multi-commodity Markovian network equilibrium discussed in Section 3.2. We consider the scenario where $\alpha=10,\beta=0.2$ , $\gamma_{tss^{\prime}}$ is given in Table 1 and $\text{dist}_{ss^{\prime}}$ is given in Fig. 2. We compute the driver number in the downtown area $\mathcal{D}=\{9,10,11\}$ by solving the optimization in (8) using commercial software Mosek (https://www.mosek.com). The results are demonstrate in Fig. 3, where we can see that the driver number increases during $1\leq t\leq 12$ , then decreases during $24\leq t\leq 36$ . There are also two sudden changes in the increasing/decreasing rate around $t=7$ and $t=31$ , which is due to the corresponding changes in values of $\gamma$ in Table 1. This example extends the single commodity case considered in [Calderone and Sastry, 2017b] and [Li et al., 2019] where all players enter and exit the game simultaneously.

A relevant application of the above simulation framework is transportation network design. For example, suppose that Seattle city council is considering two candidate light rail transit (LRT) routes, 7-9-10-11 and 6-8-9-11 (see Fig. 2), as a means to alleviate the congestion caused by the ride-sharing traffic in downtown area, assuming that the LRT will reduce the demand of ride-sharing services (namely, value of $\gamma_{tss^{\prime}}$ ) by $50\%$ along its route. The simulated equilibrium with different LRT routes are also given in Fig. 3, which shows that route 6-8-9-11 is more effective that route 7-9-10-11 in terms of reducing amount of drivers in $\mathcal{D}$ . These results clearly demonstrate the power of Markovian network equilibrium model in transportation system design.

5.2 Computation experiments

To demonstrate the efficiency of the algorithms developed in Section 4, we compare the computation time of our algorithms against commercial software Mosek, used in the previous section, over randomly generated examples. We use $\mathrm{rand}(a,b)$ to denote a random number sampled from uniform distribution over interval $[a,b]$ where $a,b\in\mathbb{R}$ and $a\leq b$ .

•

$P_{sas^{\prime}}=\mathrm{rand}(0,1)$ for all $s\in[S],a\in[A]$ , then normalized such that $\sum_{s^{\prime}}P_{sas^{\prime}}=1$

•

$\phi_{tsa}(\alpha)=\mathrm{rand}(1,2)\alpha+\mathrm{rand}(1,2)$ for all $t\in[T],s\in[S],a\in[A]$ .

•

$p_{ts}=\mathrm{rand}(0,1)$ for all $s\in[S]$ if $t=1$ and zero otherwise.

In the variable demand case, we let $\psi_{ts}(\alpha)=\mathrm{rand}(1,2)\alpha-t+21$ for all $t\in[T],s\in[S]$ . In the multi-commodity flow case, we let $\mathbb{T}=\{5,10\}$ .

We fix $T=A=10$ and let $S$ range between $20$ and $200$ , then test the computation time of Algorithm 3, Algorithm 5, Algorithm 4 and Algorithm 6, where all algorithms terminate when their objective function value agrees with the optimal one obtained by Mosek with less than $0.5\%$ relative error. The average computation time over 100 examples, along with corresponding 3-standard deviation interval are reported in Fig. 4. All codes are in MATLAB and run on a 1.6GHz laptop. From results in Fig. 4 we can see that, over the randomly generated 2000 examples, subgradient method and Frank-Wolfe method reduces the computation time consumed by Mosek by one and, respectively, two orders of magnitudes, at the price of a mere $0.5\%$ of relative accuracy.

6 Conclusion

We study the variable demand and multi-commodity extensions in Markovian network equilibrium. We also propose efficient algorithms that outperform state-of-the-art commercial optimization software. However, the current work still has several limitations. For example, the cost of actions perceived by the players is assumed to be exact, rather than corrupted by stochastic noise, as considered in stochastic user equilibrium model. Further, the ending time of each player is fixed at the beginning of the game. A more realistic assumption is to allow the players to change their ending time and recompute the equilibrium periodically. We aim to address these limitations in future work.

Appendix A Appendix

A.1 Proof of Theorem 1

The objective function of problem (4) is convex (since $\phi_{tsa}$ and $\psi_{ts}$ are strictly increasing), its constraints are affine, and the optimal value is obviously finite (since $\phi_{tsa}$ and $\psi_{ts}$ are finitely valued). These imply that a solution pair to (4) and (5) necessarily satisfy the KKT conditions [Rockafellar, 1970, Thm. 28.3.1]. Let $v_{ts}$ be the dual variable corresponding to the equality constraint containing $p_{ts}$ , let $\mu_{tsa},\theta_{ts},\lambda_{ts}\geq 0$ be the dual variables corresponding to constraint $y_{tsa}\geq 0$ , $z_{ts}\geq 0$ and, respectively, $z_{ts}\leq p_{ts}$ . Then the Lagrangian of (4) and (5) is given by

[TABLE]

The KKT conditions [Rockafellar, 1970, Thm.28.3] of this Lagrangian include the following vanishing gradient conditions (by setting $\partial L/\partial y_{tsa},\partial L/\partial x_{ts}$ equal to zero)

[TABLE]

for all $s\in[S],a\in[A]$ , and the complementarity conditions

[TABLE]

Combining (A.1) and (A.2) yields (6) and (7). Note that same results can be derived from the dual problem (5).

A.2 Proof of Theorem 2

The objective function of problem (8) is convex (since $\phi_{tsa}$ is increasing), its constraints are affine, and the optimal value is obviously finite (since $\phi_{tsa}$ is finitely valued).These imply that a solution pair to (8) and (9) necessarily satisfies the KKT conditions [Rockafellar, 1970, Cor. 28.3.1]. Let $v^{\tau}_{ts}$ be the dual variable corresponding to the equality constraints containing $p^{\tau}_{t}s$ , let $\mu^{\tau}_{tsa}\geq 0$ be the dual variables corresponding to constraint $y^{\tau}_{tsa}\geq 0$ . Then the Lagrangian of (8) and (9) is given by

[TABLE]

The KKT conditions [Rockafellar, 1970, Thm.28.3] of this Lagrangian include the following vanishing gradient conditions (by setting $\partial L/\partial y^{\tau}_{tsa}$ equal to zero)

[TABLE]

for all $t\in[\tau-1],\tau\in\mathbb{T},s\in[S],a\in[A]$ , and the complementarity conditions

[TABLE]

for all $t\in[\tau],\tau\in\mathbb{T},s\in[S],a\in[A]$ . Combining (A.3) and (A.4) yields (10). Again, same results can be derived from the dual problem (9).

A.3 Proof of Lemma 2

Using a similar argument as in the proof of Theorem 1, we can show that the KKT conditions of (13) are given by the following

[TABLE]

for all $t\in[T],s\in[S],a\in[A]$ . Let $(\hat{v},\hat{\pi})$ be the output of Algorithm 1 with input $(P,u,T)$ , $\hat{z}=p\odot(\hat{v}>w)$ , $\hat{y}$ be the output of Algorithm 2 with input $(\hat{\pi},p-\hat{z},P,T)$ , let

[TABLE]

for all $t\in[T],s\in[S],a\in[A]$ . Then it is straightforward to verify that $(\hat{y},\hat{z},\hat{v},\hat{\mu},\hat{\lambda},\hat{\theta})$ satisfies all the KKT conditions in (A.5), hence $(\hat{y},\hat{z})$ solves (13), which proves the equality. The inequality follows from the fact that, when $(u,w)$ in (13) is perturbed to another value $(u^{\prime},w^{\prime})$ , solution $(\hat{y},\hat{z})$ is still feasible, but can be suboptimal.

A.4 Proof of Lemma 3

Notice that in optimization (14), both objective function and constraints are completely separable across $y^{\tau}$ with different value of $\tau$ . In other words, solving (14) is equivalent to solve the following optimization problem for each value of $\tau\in\mathbb{T}$ separately

[TABLE]

Since problem (A.7) is nothing but an instance of (1) with $T=\tau$ , it can be solved by the output of Algorithm 2 with input $(\hat{\pi}^{\tau},p^{\tau},P,\tau)$ , where $\hat{\pi}^{\tau}$ is the output of Algorithm 1 with input $(P,u,\tau)$ . This proves the equality; the inequality follows from the fact that, when $u$ in (14) is perturbed to another value $u^{\prime}$ , solution $(\hat{y}^{\tau},\tau\in\mathbb{T})$ is still feasible, but can be suboptimal.

A.5 Proof of Theorem 3

We start with Algorithm 3. The per-iteration computation of Algorithm 3 is clearly dominated by the execution of Algorithm 1 and Algorithm 2, which together cost $O(\sigma TS^{2}A)$ arithmetical operations. Let $f(y,z)$ denote the objective function of problem (4). Then the gradients of $f$ is given by

[TABLE]

where $\phi(y)$ and $\psi(z)$ are defined as in (11). Then under Assumption 3, we know both $\frac{\partial f}{\partial y}$ and $\frac{\partial f}{\partial z}$ are Lipschitz. In addition, for any $(y,z)$ satisfying the constraints of problem (4), one must have $y_{tsa}\in[0,\rho]$ and $z_{ts}\in[0,\rho]$ , due to Assumption 1. In other words, the constraint set of problem (4) is a subset of $[0,\rho]^{T\times S\times A}\times[0,\rho]^{T\times S}$ , which is bounded.

Therefore problem (4) is minimizing a function with Lipschitz gradients over a bounded set. Hence the Frank-Wolfe method given by Algorithm 3 converges to $\epsilon$ -optimal solution in $O(\frac{1}{\epsilon})$ iterations [Bubeck et al., 2015, Thm.3.8]. The proof for Algorithm 4 is similar.

A.6 Proof of Theorem 4

We start with Algorithm 5. First, the per-iteration computation of Algorithm 5 is clearly dominated by the execution of Algorithm 1 and Algorithm 1, which together cost $O(\sigma TS^{2}A)$ arithmetical operations. From Assumption 1 and 3 we have, for any $u_{tsa},u^{\prime}_{tsa},w_{ts},w^{\prime}_{ts}\in[0,\rho]$

[TABLE]

for all $t\in[T],s\in[S],a\in[A]$ , which implies the objective function of problem (17) (in particular, the integral terms) is $\frac{1}{L}$ -strongly convex [Nesterov, 2013, Thm.2.1.10]. Let $-f(u,w)$ denote the objective function of problem (17). Then from Lemma 2 we know that the subgradients of $f$ is given by

[TABLE]

where $\phi^{-1}(u),\psi^{-1}(w)$ are defined as in (11), $(\hat{y},\hat{z})$ is a solution to problem (13). From Assumption 1 we know that $\phi^{-1}(u)\in[0,\rho]^{T\times S\times A}$ and $\psi^{-1}(w)\in[0,\rho]^{T\times S}$ . Further, $(\hat{y},\hat{z})$ must satisfy the constraints in problem (13), which implies that $\hat{y}\in[0,\rho]^{T\times S\times A}$ and $\hat{z}\in[0,\rho]^{T\times S}$ . Hence the elements in $\partial_{u}f$ and $\partial_{w}f$ are bounded.

Therefore, problem (17) is minimizing a strongly convex function whose subgradients have bounded elements. hence the projected subgradient method given by Algorithm 5 converges to an $\epsilon$ -optimal solution in $O(\frac{1}{\epsilon})$ iterations [Bubeck et al., 2015, Th,.3.9]. The proof for Algorithm 6 is similar.

Bibliography36

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[Ahipaşaoğlu et al., 2019] Ahipaşaoğlu, S. D., Arıkan, U., and Natarajan, K. (2019). Distributionally robust markovian traffic equilibrium. Transp. Sci. , 53(6):1546–1562.
2[Baillon and Cominetti, 2008] Baillon, J.-B. and Cominetti, R. (2008). Markovian traffic equilibrium. Math. Prog. , 111(1-2):33–56.
3[Beckmann et al., 1956] Beckmann, M., Mc Guire, C. B., and Winsten, C. B. (1956). Studies in the Economics of Transportation . Yale University Press.
4[Bertsekas, 1996] Bertsekas, D. P. (1996). Neuro-Dynamic Programming . Athena scientific Belmont.
5[Bertsekas, 1998] Bertsekas, D. P. (1998). Network Optimization: Continuous and Discrete Models . Citeseer.
6[Bubeck et al., 2015] Bubeck, S. et al. (2015). Convex optimization: Algorithms and complexity. Found. Trends Mach. Learn. , 8(3-4):231–357.
7[Bürger et al., 2014] Bürger, M., Zelazo, D., and Allgöwer, F. (2014). Duality and network theory in passivity-based cooperative control. Automatica , 50(8):2051–2061.
8[Calderone and Sastry, 2017 a] Calderone, D. and Sastry, S. (2017 a). Infinite-horizon average-cost markov decision process routing games. In Proc. Int. Conf. Intell. Transp. Syst. , pages 1–6. IEEE.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Variable Demand and Multi-commodity Flow in Markovian Network Equilibrium

Abstract

keywords:

1 Introduction

2 Preliminaries and background

Lemma 1** ([Puterman, 1994]).**

3 Markovian network equilibrium

3.1 Variable demand

Game 1**.**

Remark 1**.**

Remark 2**.**

Assumption 1**.**

Theorem 1**.**

3.2 Multicommodity flow

Game 2**.**

Remark 3**.**

Assumption 2**.**

Theorem 2**.**

4 Efficient algorithms via linearization

4.1 Linearization and dynamic programming

Lemma 2**.**

Lemma 3**.**

Remark 4**.**

4.2 Iterative algorithms using linearization

Definition 1**.**

Assumption 3**.**

Assumption 4**.**

Remark 5**.**

Theorem 3**.**

Theorem 4**.**

5 Numerical examples

5.1 Multicommodity ride-sharing game

5.2 Computation experiments

6 Conclusion

Appendix A Appendix

A.1 Proof of Theorem 1

A.2 Proof of Theorem 2

A.3 Proof of Lemma 2

A.4 Proof of Lemma 3

A.5 Proof of Theorem 3

A.6 Proof of Theorem 4

Lemma 1 ([Puterman, 1994]).

Game 1.

Remark 1.

Remark 2.

Assumption 1.

Theorem 1.

Game 2.

Remark 3.

Assumption 2.

Theorem 2.

Lemma 2.

Lemma 3.

Remark 4.

Definition 1.

Assumption 3.

Assumption 4.

Remark 5.

Theorem 3.

Theorem 4.