Optimal Forward Trading and Battery Control Under Renewable Electricity   Generation

Juri Hinz; Jeremy Yee

arXiv:1706.03310·math.OC·June 13, 2017

Optimal Forward Trading and Battery Control Under Renewable Electricity Generation

Juri Hinz, Jeremy Yee

PDF

Open Access

TL;DR

This paper develops an algorithmic strategy for managing battery storage and forward trading to mitigate renewable energy variability and analyze how battery tech influences market trading behavior.

Contribution

It introduces a novel algorithmic approach for battery and trading control under renewable energy fluctuations and examines the impact of battery technology on trading strategies.

Findings

01

Effective control algorithms for battery and trading management.

02

Battery technology significantly influences trading behavior.

03

Reduction in electricity price volatility through optimized control.

Abstract

The increased market penetration of renewable energy sources and the rapid development of electric battery storage technologies yield a potential for reducing electricity price volatility while maintaining stability of the power grid. This work presents an algorithmic approach to control battery levels and forward positions to optimally manage power output fluctuations caused by intermittent renewable energy generation. This paper will also explore the effect of battery technology on the firm's optimal trading behaviour in the electricity spot market.

Tables3

Table 1. Table 1: Solution diagnostics for z ( 2 ) = 0 superscript 𝑧 2 0 z^{(2)}=0 . Standard errors in paranthesis.

Battery Level (MWh)	Lower Bound	Upper Bound
0	-1679.759 (0.042)	-1679.756 (0.042)
5	-1629.759 (0.042)	-1629.756 (0.042)
10	-1579.759 (0.042)	-1579.756 (0.042)
15	-1529.759 (0.042)	-1529.756 (0.042)
20	-1480.069 (0.042)	-1480.066 (0.042)
25	-1433.475 (0.041)	-1433.472 (0.041)
30	-1389.587 (0.041)	-1389.583 (0.041)
35	-1348.411 (0.041)	-1348.408 (0.041)
40	-1310.032 (0.041)	-1310.028 (0.041)
45	-1274.505 (0.041)	-1274.502 (0.041)
50	-1241.857 (0.040)	-1241.853 (0.040)
55	-1212.091 (0.040)	-1212.088 (0.040)
60	-1185.201 (0.040)	-1185.197 (0.040)
65	-1161.168 (0.040)	-1161.165 (0.040)
70	-1139.971 (0.039)	-1139.968 (0.039)
75	-1121.586 (0.039)	-1121.583 (0.039)
80	-1105.989 (0.039)	-1105.986 (0.039)
85	-1093.160 (0.039)	-1093.157 (0.039)
90	-1083.071 (0.039)	-1083.068 (0.039)
95	-1075.638 (0.039)	-1075.634 (0.039)
100	-1070.639 (0.039)	-1070.636 (0.039)

Table 2. Table 2: Bounds on cumulated rewards estimation for different parameters.

	Small Battery 5 MWh		Large Battery 100 MWh
$ϕ$	Lower	Upper	Lower	Upper
0.9	-18904.06 (0.151)	-18904.06 (0.151)	-1679.759 (0.042)	-1679.756 (0.042)
0.6	-19004.19 (0.073)	-19004.19 (0.073)	-1682.616 (0.037)	-1682.609 (0.037)
0.3	-19017.53 (0.060)	-19017.52 (0.059)	-1679.807 (0.038)	-1679.799 (0.039)
0.1	-19019.21 (0.057)	-19019.20 (0.057)	-1676.744 (0.042)	-1676.732 (0.042)

Table 3. Table 3: Bounds on cumulated rewards under ϕ = 0.9 italic-ϕ 0.9 \phi=0.9 and z ( 2 ) = 0 superscript 𝑧 2 0 z^{(2)}=0 .

Capa-	Non-zero scrap value		Zero scrap value
city	Lower	Upper	Lower	Upper
10	-14068.958 (0.115)	-14068.957 (0.115)	-14124.612 (0.115)	-14124.611 (0.115)
20	-8762.276 (0.078)	-8762.275 (0.077)	-8879.116 (0.078)	-8879.115 (0.078)
30	-6114.388 (0.049)	-6114.388 (0.049)	-6292.050 (0.049)	-6292.049 (0.049)
40	-4629.497 (0.039)	-4629.496 (0.039)	-4866.371 (0.039)	-4866.370 (0.039)
50	-3685.724 (0.033)	-3685.723 (0.033)	-3980.018 (0.033)	-3980.017 (0.033)
60	-3033.977 (0.030)	-3033.977 (0.030)	-3384.379 (0.029)	-3384.379 (0.029)
70	-2559.781 (0.028)	-2559.781 (0.028)	-2965.728 (0.027)	-2965.728 (0.027)
80	-2198.558 (0.031)	-2198.557 (0.031)	-2660.035 (0.027)	-2660.034 (0.027)
90	-1912.817 (0.035)	-1912.815 (0.035)	-2430.169 (0.029)	-2430.168 (0.029)
100	-1679.759 (0.042)	-1679.756 (0.042)	-2253.495 (0.033)	-2253.493 (0.033)

Equations108

\boxed{\text{{\bf\Large Consumer}}}\Longleftrightarrow\begin{array}[]{c}\boxed{\text{\bf\large Optimal Control}}\\ \Updownarrow\\ \boxed{\begin{array}[]{c}{\text{\bf Forward Contracts}}\end{array}}\end{array}\Longleftrightarrow\boxed{\hbox{\bf\Large Grid}}

\boxed{\text{{\bf\Large Consumer}}}\Longleftrightarrow\begin{array}[]{c}\boxed{\text{\bf\large Optimal Control}}\\ \Updownarrow\\ \boxed{\begin{array}[]{c}{\text{\bf Forward Contracts}}\end{array}}\end{array}\Longleftrightarrow\boxed{\hbox{\bf\Large Grid}}

{\begin{array}[]{c}\boxed{\text{{\bf\Large Consumer}}}\\ \ominus\\ \boxed{\begin{array}[]{c}\text{{\bf Renewable}}\\ \text{{\bf Energy}}\end{array}}\end{array}}\Longleftrightarrow\begin{array}[]{c}\boxed{\text{\bf\large Optimal Control}}\\ \Updownarrow\\ \boxed{\begin{array}[]{c}{\text{\bf Battery Storage \&}}\\ {\text{\bf Forward Contracts}}\end{array}}\end{array}\Longleftrightarrow\boxed{\hbox{\bf\Large Grid}}

{\begin{array}[]{c}\boxed{\text{{\bf\Large Consumer}}}\\ \ominus\\ \boxed{\begin{array}[]{c}\text{{\bf Renewable}}\\ \text{{\bf Energy}}\end{array}}\end{array}}\Longleftrightarrow\begin{array}[]{c}\boxed{\text{\bf\large Optimal Control}}\\ \Updownarrow\\ \boxed{\begin{array}[]{c}{\text{\bf Battery Storage \&}}\\ {\text{\bf Forward Contracts}}\end{array}}\end{array}\Longleftrightarrow\boxed{\hbox{\bf\Large Grid}}

F_{t} (a) - Q_{t} = q_{t} + l (a) - (q_{t} + ε_{t}) = l (a) - ε_{t}, a \in A .

F_{t} (a) - Q_{t} = q_{t} + l (a) - (q_{t} + ε_{t}) = l (a) - ε_{t}, a \in A .

0 \leq \underline{Π}_{t} \leq \overline{Π}_{t}, t = 0, \dots, T

0 \leq \underline{Π}_{t} \leq \overline{Π}_{t}, t = 0, \dots, T

(q_{t} + l (a)) Π_{t} = q_{t} Π_{t} + l (a) Π_{t} .

(q_{t} + l (a)) Π_{t} = q_{t} Π_{t} + l (a) Π_{t} .

r_{t} (p, (q_{t}, Π_{t}), a) = - q_{t} Π_{t} - l (a) Π_{t} - \overline{e}_{p}^{a} \overline{Π}_{t} + \underline{e}_{p}^{a} \underline{Π}_{t} .

r_{t} (p, (q_{t}, Π_{t}), a) = - q_{t} Π_{t} - l (a) Π_{t} - \overline{e}_{p}^{a} \overline{Π}_{t} + \underline{e}_{p}^{a} \underline{Π}_{t} .

r_{t} (p, Π_{t}, a) = - l (a) Π_{t} - \overline{e}_{p}^{a} \overline{Π}_{t} + \underline{e}_{p}^{a} \underline{Π}_{t} .

r_{t} (p, Π_{t}, a) = - l (a) Π_{t} - \overline{e}_{p}^{a} \overline{Π}_{t} + \underline{e}_{p}^{a} \underline{Π}_{t} .

P^{x_{0}, π} (X_{t + 1} \in B ∣ X_{0}, \dots, X_{t}) = K_{t}^{π_{t} (X_{t})} (X_{t}, B)

P^{x_{0}, π} (X_{t + 1} \in B ∣ X_{0}, \dots, X_{t}) = K_{t}^{π_{t} (X_{t})} (X_{t}, B)

(K_{t}^{a} v) (x) = \int_{E} v (x^{'}) K_{t}^{a} (x, d x^{'}) x \in E,

(K_{t}^{a} v) (x) = \int_{E} v (x^{'}) K_{t}^{a} (x, d x^{'}) x \in E,

v_{0}^{π} (x_{0}) = E^{x_{0}, π} (t = 0 \sum T - 1 r_{t} (X_{t}, π_{t} (X_{t})) + r_{T} (X_{T})),

v_{0}^{π} (x_{0}) = E^{x_{0}, π} (t = 0 \sum T - 1 r_{t} (X_{t}, π_{t} (X_{t})) + r_{T} (X_{T})),

π^{*} (x_{0}) = ar g π max v_{0}^{π} (x_{0}), x_{0} \in E .

π^{*} (x_{0}) = ar g π max v_{0}^{π} (x_{0}), x_{0} \in E .

T_{t} v (x) = a \in A sup (r_{t} (x, a) + K_{t}^{a} v (x)), x \in E

T_{t} v (x) = a \in A sup (r_{t} (x, a) + K_{t}^{a} v (x)), x \in E

v_{T}^{*} = r_{T}, v_{t}^{*} = T_{t} v_{t + 1}^{*} for t = T - 1, \dots, 0.

v_{T}^{*} = r_{T}, v_{t}^{*} = T_{t} v_{t + 1}^{*} for t = T - 1, \dots, 0.

π_{t}^{*} (x) = argmax_{a \in A} (r_{t} (x, a) + K_{t}^{a} v_{t + 1}^{*} (x)), x \in E

π_{t}^{*} (x) = argmax_{a \in A} (r_{t} (x, a) + K_{t}^{a} v_{t + 1}^{*} (x)), x \in E

(α_{p, p^{'}}^{a})_{p, p^{'} \in P}, a \in A

(α_{p, p^{'}}^{a})_{p, p^{'} \in P}, a \in A

Z_{t + 1} = W_{t + 1} Z_{t}

Z_{t + 1} = W_{t + 1} Z_{t}

K_{t}^{a} v (p, z)

K_{t}^{a} v (p, z)

T_{t} v (p, z)

z \mapsto r_{t} (p, \cdot, a), z \mapsto r_{T} (p, z) t = 0, \dots, T - 1, p \in P, a \in A

z \mapsto r_{t} (p, \cdot, a), z \mapsto r_{T} (p, z) t = 0, \dots, T - 1, p \in P, a \in A

v_{T}^{*} (p, z)

v_{T}^{*} (p, z)

v_{t}^{*} (p, z)

π_{t}^{*} (p, z) = argmax_{a \in A} (r_{t} (p, z, a) + p^{'} \in P \sum α_{p, p^{'}}^{a} E_{P} (v_{t + 1}^{*} (p^{'}, W_{t + 1} z))) .

π_{t}^{*} (p, z) = argmax_{a \in A} (r_{t} (p, z, a) + p^{'} \in P \sum α_{p, p^{'}}^{a} E_{P} (v_{t + 1}^{*} (p^{'}, W_{t + 1} z))) .

v_{T}^{*} = r_{T}, v_{t}^{*} = T_{t} v_{t + 1}^{*}, t = T - 1, \dots 0

v_{T}^{*} = r_{T}, v_{t}^{*} = T_{t} v_{t + 1}^{*}, t = T - 1, \dots 0

T_{t} v (p, z) = a \in A max r_{t} (p, z, a) + p^{'} \in P \sum α_{p, p^{'}}^{a} E (v (p^{'}, W_{t + 1} z)) .

T_{t} v (p, z) = a \in A max r_{t} (p, z, a) + p^{'} \in P \sum α_{p, p^{'}}^{a} E (v (p^{'}, W_{t + 1} z)) .

n = 1 \sum N ν_{t + 1}^{N} (n) v (p^{'}, W_{t + 1} (n) z))

n = 1 \sum N ν_{t + 1}^{N} (n) v (p^{'}, W_{t + 1} (n) z))

(W_{t + 1} (n))_{n = 1}^{N}

(W_{t + 1} (n))_{n = 1}^{N}

v_{T} = r_{T}, v_{t} = T_{t}^{N} v_{t + 1}, t = T - 1, \dots 0

v_{T} = r_{T}, v_{t} = T_{t}^{N} v_{t + 1}, t = T - 1, \dots 0

S_{G} f = \lor_{g \in G} (▽_{g} f)

S_{G} f = \lor_{g \in G} (▽_{g} f)

T_{t}^{G, N} v (p, \cdot) = S_{G} T_{t}^{N} v (p, \cdot),

T_{t}^{G, N} v (p, \cdot) = S_{G} T_{t}^{N} v (p, \cdot),

v_{T} (p, \cdot)

v_{T} (p, \cdot)

v_{t} (p, \cdot)

f \sim F \Rightarrow S_{G} f \sim Υ_{G} [F]

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSmart Grid Energy Management · Electric Vehicles and Infrastructure · Electric Power System Optimization

Full text

Optimal Forward Trading and Battery Control Under Renewable Electricity Generation

Juri Hinz

[email protected]

Jeremy Yee

[email protected]

Abstract

The increased market penetration of renewable energy sources and the rapid development of electric battery storage technologies yield a potential for reducing electricity price volatility while maintaining stability of the power grid. This work presents an algorithmic approach to control battery levels and forward positions to optimally manage power output fluctuations caused by intermittent renewable energy generation. This paper will also explore the effect of battery technology on the firm’s optimal trading behaviour in the electricity spot market.

1 Introduction

The recent proliferation of renewable energies and technological progress in electric battery storage systems create an increasing demand for sound algorithmic solutions to optimal control problems arising in the dispatch optimization of power supply given a storage facility and uncertainty caused by the market prices or/and weather conditions. Such problems are numerically challenging due to high dimensionality of the state spaces involved. This work suggests a quantitative approach to address a growing need for efficient decentralized electricity dispatch and storage.

Let us describe its typical framework. The traditional electricity market players satisfy consumers’ energy demand by purchasing electricity in advance, usually taking positions in the so-called day-ahead market (also called the spot market) such that any energy imbalances must be compensated in real-time as they occur. This real-time balancing can either be achieved through complex over-the-counter trading or, more realistically, by transferring supply from or to electricity grid at the so-called real time grid prices. Figure 1 provides a simplified illustration of this optimal control problem. However, in the presence of storage and renewable generation facilities, the problem changes. On this new structure, the agent’s control problem now requires simultaneously taking optimal positions and setting optimum energy storage levels as shown in Figure 2. The decision optimization problem becomes significantly more complex due to the uncertainty stemming from the future battery capacity levels, electricity prices, and output of renewable energy.

Many renewable energy sources such as wind and solar are notoriously intermittent and unreliable. The potential of energy storage devices to address the highly erratic nature of renewable energy generation [6, 11] and energy demand has been discussed extensively in the literature (see [3, 14, 15, 21, 33]). Their incorporation into a modern energy grid will encourage more environmentally friendly policies which will also have significant impact on investor atittudes towards firms [27, 9, 28]. The authors of [24] studied the possible usage of battery storage systems to defer costly modifications to the energy grid by addressing peak loads in the power grid. An extensive recent review of available energy storage technologies has been given by [25] and future innovation looks bright. While there exist numerous types of energy storage systems, [3] found that no single storage system consistently outperforms all the others for all types of renewable energy sources and applications. So for the sake of simplicity, this paper will assume that the energy retailer pictured in Figure 2 uses a battery device for storing energy. However, the methods and results contained within this paper can easily be extended for other types of storage technologies or even to the use of multiple types of storage devices. From a real options analysis point of view, the incorporation of energy storage devices into energy grid also poses interesting investment questions. The work done by [1, 5, 23] examined the profitability of investing in energy storage devices. However, [31] questions the suitability of the current real options approach, stating that the risk neutrality assumption may not be appropriate for risk averse investors. The introduction of batteries also gives rise to important optimal stochastic control problems. The optimal dynamic storage and discharge of energy from battery devices has been examined in [13, 22, 26, 32].

Rather than focusing on capacity investment decision, the present contribution focuses on optimal operational management in terms of energy purchase and dispatch optimization, given a storage device of a fixed capacity. Thereby, we suppose that storage facility is only used for the compensation of any imbalance between consumers’ demand, renewable energy generation and an existing financial position. This issue is connected to the market behaviour addressed in [8] in terms of reducing the risk in the sense of [4], since a storage acts as a safety buffer. The present work helps investigating the effect of battery on forward energy trading.

This paper is organized as follows: Section 2 introduces the model while Section 3 frames the main question as a Markov Decision Problem whose numerical solution is outlined in Section 4. Section 5 provides a numerical study of prices which exhibit distinct mean reversion and seasonality. Finally, Section 6.1 examines the impact of battery on the optimal forward energy trading with conclusions provided in Section 7.

2 Problem Setting

Within a given time horizon $t=0,\dots,T-1$ , the net energy demand $Q_{t}$ within each period is the difference between the consumer’s demand and the renewable energy output. Given an existing financial position $F_{t}$ , the energy imbalance $F_{t}-Q_{t}$ will be compensated using energy from the battery storage followed by a possible offset through real-time energy from the power grid. That is, in the case of energy surplus $F_{t}-Q_{t}\geq 0$ , the electricity is first used to charge the battery up to the maximal level and the remaining energy is then supplied to the grid. Similarly, if there is an energy shortage $F_{t}-Q_{t}<0$ , the required electricity is taken from the battery up to a minimal battery level before the required electricity rest is taken from the grid.

Let us assume that the net demand realization is given by $Q_{t}=q_{t}+\varepsilon_{t}$ with zero-mean random variable $\varepsilon_{t}$ describing the deviation of the net demand from its predicted level $q_{t}$ and suppose that the financial position is given in terms of $F_{t}=q_{t}+l$ where the quantity $l$ describes a safety margin and stands retailer’s decision to buy/sell in the spot market an energy amount $q_{t}+l$ which deviates from the predicted net demand $q_{t}$ by $l$ . Thereby, we model the decision of the retailer in the choice $F_{t}=F_{t}(a)$ of financial positions in terms of the action $a\in\mathbf{A}$ from a finite set $\mathbf{A}$ of all possible actions, each characterized by its specific safety margin $l(a)$ . With the assumptions above, given the action $a\in\mathbf{A}$ , the realized net energy to be balanced is given by

[TABLE]

That is, the action $a\in\mathbf{A}$ determines a certain distribution $\nu_{t}(a)$ of the energy volume which must be balanced and is determined as $\nu_{t}(a)\sim l(a)-\varepsilon_{t}$ for all $a\in\mathbf{A}$ . In order to describe the battery storage control, we suggest discretizing the storage levels by a finite set $\mathbf{P}$ . Having chosen the action $a\in\mathbf{A}$ , the imbalance energy $F_{t}(a)-Q_{t}=l(a)-\varepsilon_{t}$ follows a distribution $\nu_{t}(a)$ which determines for each battery storage level $p\in\mathbf{P}$ a probability $\alpha_{p,p^{\prime}}^{a}$ that the storage reaches its next-day level $p^{\prime}\in\mathbf{P}$ . Furthermore, the expected energy excess $\underline{e}^{a}_{p}$ and shortage $\overline{e}^{a}_{p}$ are uniquely determined by the current battery level $p\in\mathbf{P}$ and the action $a\in\mathbf{A}$ through the imbalance distribution $\nu_{t}(a)$ .

Now let us turn to the costs of energy imbalance. For this, we introduce the random variables

[TABLE]

which stand for the sell/buy real time grid prices expected at time $t$ , when the financial position $F_{t}(a)$ is taken. With these definitions, the revenue/costs associated with energy imbalance for the action $a\in\mathbf{A}$ are modeled by $-\overline{e}^{a}_{p}\overline{\Pi}_{t}+\underline{e}^{a}_{p}\underline{\Pi}_{t}.$ Finally, let us denote by $\Pi_{t}$ the energy price at time $t=0,\dots,T$ . Since we assume that all feasible financial positions are given as $q_{t}+l(a)$ with $a\in\mathbf{A}$ , the position costs for the action $a\in\mathbf{A}$ are

[TABLE]

With assumptions and notations as above, the revenue/loss associated with the action $a\in\mathbf{A}$ depends on the current price $\Pi_{t}$ , the expected demand $q_{t}$ , and the recent battery level $p\in\mathbf{P}$ as

[TABLE]

Observe that the term $-q_{t}\Pi_{t}$ neither depends on the action $a$ nor on battery level $p$ . Thus, we agree that the choice $a\in\mathbf{A}$ of the optimal safety margin $l(a)$ will depend only on electricity price and re-define the reward as

[TABLE]

Note that we also do not consider other revenues associated with income streams due to delivery commitments at fixed price. On this account, the revenue (2.1) serves a vehicle to optimize trading activity in terms of optimal safety margins and does not reflect actual cash flows.

The revenue optimization from battery storage management is a typical sequential decision problem under uncertainty. Having chosen at time $t=0,\dots,T-1$ an action $a\in\mathbf{A}$ in the data $(p,\Pi_{t})$ a certain revenue/costs $r_{t}(p,\Pi_{t},a)$ is incurred immediately. However, the action $a\in\mathbf{A}$ also changes the probability of transition to the subsequent states (next battery levels) which influences all future revenues and decisions. Problems of this type are naturally formulated and solved in terms of the so-called Markov Decision Theory. In what follows, we formulate our storage control problem within this standard framework.

3 Markov Decision Theory for Battery Control

Let us review the classical finite-horizon Markov decision theory following [2]. On a finite time horizon $0,\dots,T$ , consider a random dynamics whose state $x$ evolves in (measure) space $E$ and is controlled by actions $a$ from a finite action set $\mathbf{A}$ . For each $a\in\mathbf{A}$ , we assume that $K^{a}_{t}(x,dx^{\prime})$ is a stochastic transition kernel on $E$ . A mapping $\pi_{t}:E\mapsto\mathbf{A}$ which describes the action that the controller takes at time $t$ is called a decision rule. A sequence of decision rules $\pi=(\pi_{t})_{t=0}^{T-1}$ is called a policy. For each initial point $x_{0}\in E$ and each policy $\pi$ , there exists a probability measure ${\mathbb{P}}^{x_{0},\pi}$ and a stochastic process $(X_{t})_{t=0}^{T}$ such that ${\mathbb{P}}^{x_{0},\pi}(X_{0}=x_{0})=1$ and

[TABLE]

holds for each (measurable) $B\subset E$ at all times $t=0,\dots,T-1$ . That is, given the system state $X_{t}$ at time $t$ , the action $a=\pi_{t}(X_{t})$ is used to pick the transition probability $K_{t}^{a=\pi_{t}(X_{t})}(X_{t},\cdot)$ which randomly drives the system from $X_{t}$ to $X_{t+1}$ with the distribution $K^{\pi_{t}(X_{t})}_{t}(X_{t},\,\cdot\,)$ . Let us use ${\cal K}^{a}_{t}$ to denote the one-step transition operator associated with the transition kernel $K^{a}_{t}$ when the action $a\in\mathbf{A}$ is chosen. In other words, for each action $a\in\mathbf{A}$ the operator ${\cal K}^{a}_{t}$ acts on functions $v$ by

[TABLE]

whenever the above integrals are well-defined. Now, let us turn to the definition of the control costs. For each time $t$ , we are given the $t$ -step reward function $r_{t}:E\times\mathbf{A}\mapsto{\mathbb{R}}$ , where $r_{t}(x,a)$ represents the reward for applying an action $a\in\mathbf{A}$ when the state of the system is $x\in E$ at time $t$ . At the end of the time horizon, at time $T$ , it is assumed that no action can be taken. Here, if the system is in a state $x$ , a scrap value $r_{T}(x)$ , which is described by a pre-specified scrap function $r_{T}:E\to{\mathbb{R}}$ , is collected.

Given an initial point $x_{0}$ , the goal is to maximize the expected finite-horizon total reward

[TABLE]

over all possible policies $\pi=(\pi_{t})_{t=0}^{T-1}$ , where ${\mathbb{E}}^{x_{0},\pi}$ denotes the expectation over the controlled Markov chain defined by (3.1). In other words, to find the argument $\pi^{*}=(\pi^{*}_{t})_{t=0}^{T-1}$ such that

[TABLE]

The maximization (3.4) is well-defined under additional assumptions (see [2], p. 199).

The calculation of the optimal policy is addressed in the following setting. For $t=0,\dots,T-1$ , introduce the Bellman operator

[TABLE]

which acts on each measurable function $v:E\to\mathbb{R}$ where the integrals ${\cal K}_{t}^{a}v$ for all $a\in\mathbf{A}$ exist. Further, consider the Bellman recursion

[TABLE]

Under appropriate assumptions, there exists a recursive solution $(v^{*}_{t})_{t=0}^{T}$ to the Bellman recursion, which gives the so-called value functions and determines an optimal policy $\pi^{*}$ via

[TABLE]

for all $t=0,\dots,T-1$ .

Consider now a Markov decision model whose state evolution consists of one discrete and one continuous component. To be more specific, we assume that the state space $E=\mathbf{P}\times\mathbb{R}^{d}$ is the product of a finite space $\mathbf{P}$ and the Euclidean space $\mathbb{R}^{d}$ . We suppose that the discrete component $p\in\mathbf{P}$ is driven by a finite number of actions $a\in\mathbf{A}$ in terms stochastic matrices

[TABLE]

where $\alpha_{p,p^{\prime}}\in[0,1]$ stands for the transition probability from $p\in\mathbf{P}$ to $p^{\prime}\in\mathbf{P}$ if the action $a\in\mathbf{A}$ was taken. Furthermore, we assume that the continuous state component evolves as an uncontrolled Markov process $(Z_{t})_{t=0}^{T}$ on $\mathbb{R}^{d}$ realized on a probability space $(\Omega,{\cal F},{\mathbb{P}})$ whose evolution is driven by random linear transformations

[TABLE]

with pre-specified independent and integrable disturbance matrices $(W_{t})_{t=1}^{T}$ . In this setting, the transition and the Bellman operators are given by

[TABLE]

for $t=0,\dots,T-1$ , and $a\in\mathbf{A}$ . Finally, let us assume that the reward and scrap functions

[TABLE]

are convex and globally Lipschitz continuous in the continuous component $z\in\mathbb{R}^{d}$ of the state variable $(p,z)$ . Such Markov decision problems are referred to as convex switching systems (see [17]). For such system, the backward induction for $p\in\mathbf{P}$ , $z\in\mathbb{R}^{d}$

[TABLE]

for $t=T-1,\dots 0$ yields value functions $(v^{*}_{t})_{t=0}^{T}$ which provide an optimal policy $(\pi^{*}_{t})_{t=0}^{T}$ via

[TABLE]

4 Numerical Solution and Diagnostics

This paper will use the numerical approaches studied in [17, 18, 20] to solve Markov Decision problems of convex switching type introduced in the previous section. We refer interested readers to those works for a more detailed explanation. However, for the sake of this paper’s completeness, this section will briefly outline these methods. The first step in obtaining a numerical solution to the backward induction

[TABLE]

is an appropriate discretization of the Bellman operator

[TABLE]

For this reason, we consider a modified Bellman operator ${\cal T}^{n}_{t}$ instead of ${\cal T}_{t}$ with the expectation ${\mathbb{E}}(v(p^{\prime},W_{t+1}z))$ replaced by its numerical counterpart as

[TABLE]

defined in terms of an appropriate distribution sampling

[TABLE]

In the resulting modified backward induction

[TABLE]

the functions $(v_{t})_{t=0}^{T}$ need to be described by algorithmically tractable objects. We may then approximate these convex functions in terms of piecewise linear and convex functions in the following manner. First, we introduce the so-called sub-gradient envelope ${\cal S}_{G}f$ of a convex function $f:\mathbb{R}^{d}\to\mathbb{R}$ on a grid $G\subset\mathbb{R}^{d}$ as

[TABLE]

which is a maximum of the sub-gradients $\triangledown_{g}f$ of $f$ on all grid points $g\in G$ . Using sub-gradient envelope operator, we define the double-modified Bellman operator as

[TABLE]

where the operator ${\cal S}_{G}$ stands for the sub-gradient envelope on the grid $G$ . The corresponding backward induction

[TABLE]

yields the so-called double-modified value functions $(v_{t})_{t=0}^{T}$ which enjoy excellent algorithmic properties. Namely, since the functions $(v_{t})_{t=0}^{T}$ are piece-wise linear and convex, they can be expressed using matrix representations. Note that any piecewise convex function $f$ can be described by a matrix where each of the linear functionals is represented by one of the matrix’s rows. To denote this relation, let us agree on the following notation: Given a function $f$ and a matrix $F$ , we write $f\sim F$ whenever $f(z)=\max(Fz)$ holds for all $z\in\mathbb{R}^{d}$ . It turns out that the sub-gradient envelope operation ${\cal S}_{G}$ on a grid $G$ corresponds to a specific row-rearrangement operator in the following sense

[TABLE]

where the row-rearrangement $\Upsilon_{G}$ associated with grid $G=\{g^{1},\dots,g^{m}\}\subset\mathbb{R}^{d}$ acts on matrix $F$ with $d$ columns as follows:

[TABLE]

For piecewise convex functions, the result of maximization, summation, and composition with linear mapping, followed by sub-gradient envelope can be obtained using their matrix representatives. More precisely, if

[TABLE]

holds, it follows that

[TABLE]

where the operator ${\sqcup}$ stands for binding matrices by rows, which yields a matrix whose rows contain all rows from each participating matrix. Using these relations, it turns out that the double-modified backward induction can be rewritten in terms of the row-rearrangement operator $\Upsilon=\Upsilon_{G}$ , binding operator $\sqcup$ and summations, applied to matrix representatives of the double-modified value functions. Let us describe the resulting algorithm.

Given a finite grid $G\subset\mathbb{R}^{d}$ , implement the row-rearrangement operator $\Upsilon=\Upsilon_{G}$ and the matrix binding operator $\sqcup$ . Determine a distribution sampling $(W_{t}(n))_{n=1}^{N}$ of each disturbance $W_{t}$ with the corresponding weights $(\nu^{N}_{t}(n))_{n=1}^{N}$ for $t=1,\dots,T$ . Given reward functions $(r_{t})_{t=0}^{T-1}$ and scrap value $r_{T}$ , determine the matrix representative of their sub-gradient envelopes

[TABLE]

for $t=0,\dots,T-1$ , $p\in\mathbf{P}$ and $a\in\mathbf{A}$ . Introduce matrix representatives of each value function

[TABLE]

which are obtained via Algorithm 4.1.

Having calculated matrix representatives $(V_{t})_{t=0}^{T}$ , the approximations $(v_{t})_{t=0}^{T}$ , $(v^{E}_{t})_{t=0}^{T}$ of the value functions and their expectations are given by

[TABLE]

for all $z\in\mathbb{R}^{d}$ , $t=1,\dots,T$ , and $p\in\mathbf{P}$ . Furthermore, an approximately optimal strategy $(\tilde{\pi}_{t})_{t=0}^{T-1}$ is obtained for $t=0,\dots,T-1$ as

[TABLE]

We utilize an adaptation of the duality techniques developed by C. Rogers [30] (see also [29], [16], [10]), to assess the quality of our numerical solution. In its original formulation, the duality approach provides an upper bound estimate for the unknown value function. This technique has further developed in the context of discrete-time, giving a promising view on duality of stochastic control which was achieved in terms of the so-called information relaxation dual that was pioneered in the seminal paper by Brown, Smith, and Sun [7].

Here, we follow to the diagnostics method described in [20] whose proofs are found in [18]. Suppose that a candidate $(\pi_{t})_{t=0}^{T-1}$ for approximatively optimal policy is given. To estimate its distance-to-optimality, we address the performance gap $[v^{\pi}_{0}(p_{0},z_{0}),v^{\pi^{*}}_{0}(p_{0},z_{0})]$ in policy values (3.3) at a given starting point $z_{0}=Z_{0}$ . For this, we construct random variables $\underline{v}^{\pi,\varphi}_{0}(p_{0},z_{0})$ , $\overline{v}^{\varphi}_{0}(p_{0},z_{0})$ satisfying

[TABLE]

The calculation of the expectations

[TABLE]

is realized through an efficient recursive Monte-Carlo scheme, which yields approximations to (4.10) along with appropriate confidence intervals.

For a practical application of this bound estimation, we assume that an approximate solution yields a candidate $(\pi_{t})_{t=0}^{T-1}$ for an optimal strategy, as in (4.8) based on approximations of the value and of the expected value functions as in (4.6) and (4.7). Further, choose a path number $K$ and a nesting number $I\in{\mathbb{N}}$ to obtain for each $k=1,\dots,K$ and $i=0,\dots,I$ independent realizations $(w^{i,k}_{t})_{t=0}^{T}$ of the random variables $(W_{t})_{t=0}^{T=1}$ and define for $k=1,\dots,K$ the state trajectories $(z_{t}^{k})_{t=0}^{T}$ recursively

[TABLE]

Estimators for the bounds in (4.9) can be obtained using Algorithm 4.2 below.

Remark: It is important to note that the convexity and Lipschitz continuity of the reward and scrap functions presented in (2.1) is essential for strong convergence properties of Algorithm 4.1. However, these assumptions are not required for the results from Algorithm 4.2 to be valid. Using an appropriate embedding of the state vector $Z_{t}$ , the state dynamics presented in (3.7) is flexible enough to encompass a wide range of state evolution process including geometric Brownian motion, auto regression of order one and GARCH-like features. In addition, the dynamics presented in (3.7) can be extended to include a more general function specification.

5 Battery Control for Auto-Regressive State Dynamics

As a demonstration, let us consider a model based on the auto-regressive state dynamics. To cover this process under our framework, we introduce $Z_{t}=(Z^{(1)}_{t},Z^{(2)}_{t})=(1,Z^{(2)}_{t})$ where the first component equals to one for $t=0,\dots,T$ and define the linear state dynamics

[TABLE]

with constants $\mu\in\mathbb{R}$ , $\sigma\in\mathbb{R}_{+}$ and $\phi\in[0,1]$ , driven by independent standard normally distributed random variables $(N_{t})_{t=1}^{T}$ . Further, we assume that the electricity price $(\Pi_{t})_{t=0}^{T}$ is governed by the function $f:\mathbb{N}_{+}\times\mathbb{R}\mapsto\mathbb{R}$ applied to the state process as

[TABLE]

In this work, we restrict ourselves to consider deterministic affine linear functions $(f(t,\cdot))_{t=0}^{T}$ to appropriately describe any seasonal pattern of the electricity price, frequently observed in practice.

To model the the consumer’s energy demand $Q_{t}=q_{t}+\varepsilon_{t}$ realized at time $t$ , we suppose that conditioned on the information at time $t$ , the deviation $\varepsilon_{t}$ of the realized demand from its predicted value $q_{t}$ follows a centered normal distribution with a given variance $\varsigma^{2}\in\mathbb{R}_{+}$ . To describe the evolution of the battery storage levels, let us assume that a finite set $\mathbf{P}$ describes the storage levels of the battery which are equidistantly spaced between the minimal $\underline{p}=\min\mathbf{P}$ and the maximal $\overline{p}=\max\mathbf{P}$ level with a step size $\Delta>0$ . Furthermore, consider a finite set $\mathbf{A}$ of actions along with the function $l:\mathbf{A}\to\mathbb{R}$ prescribing the safety margin $l(a)$ chosen by the retailer’s action $a\in\mathbf{A}$ . According to our assumptions, let us agree that having chosen the action $a\in\mathbf{A}$ at the current battery level $P_{t}$ , the next level $P_{t+1}$ is modeled as

[TABLE]

from which the transition probabilities in storage levels are induced by the action $a$ are

[TABLE]

where $\mathcal{N}(p+l(a),\varsigma)$ stands for the probability measure associated with the normal distribution with mean $p+l(a)$ and variance $\varsigma^{2}$ . In a similar manner, the expected excess $\underline{e}^{a}_{p}$ and shortage $\overline{e}^{a}_{p}$ of the imbalance energy can be written as

[TABLE]

With these definitions, the reward functions are given as above in (2.1). More specifically, having introduced the rewards

[TABLE]

and assuming that at maturity date $T$ the entire energy from storage capacity can be sold at the forward market, the scrap value is given by

[TABLE]

Finally, let us assume that the buy/sell grid prices are constant and deterministic

[TABLE]

to define the reward functions by

[TABLE]

for all $a\in\mathbf{A}$ , $p\in\mathbf{P}$ and $(z^{(1)},z^{(2)})\in\mathbb{R}^{2}$ . Finally, introduce the scrap value by

[TABLE]

Note that with the definitions (5.5), (5.6) and (5.1) our problem of battery storage control is uniquely determined, whose numerical solution is demonstrated in the next subsection.

6 A Case Study

Let us suppose that the battery level are equidistantly discretized with $\underline{p}=0$ MWh, $\overline{p}=100$ MWh, and $\Delta=5$ MWh. Furthermore, we assume that energy retailer chooses actions $a\in\mathbf{A}=\{1,2,\dots,11\}$ with corresponding safety margins $l(a)=5(a-1)$ MWh for all $a\in\mathbf{A}$ . Further, consider the time horizon of a week at half-hourly frequency i.e $t=0,1,\dots,335=T$ and define the auto-regressive state dynamics as above, with by $\mu=0$ , $\sigma=0.5$ and $\phi=0.9$ . To describe seasonality, we assume that the affine linear functions are given by

[TABLE]

with deterministic coefficients $u_{t}=10+\cos(\frac{2\pi}{48}t+\frac{3\pi}{2})$ , $v_{t}=1+\sin(\frac{2\pi}{48}t+\frac{3\pi}{2})/2$ for $t=0,\dots,T$ . Figure 3 depicts trajectories of the corresponding price evolution having started $Z^{(2)}_{0}=z^{(2)}_{0}=0$ .

Assume that the standard deviation of the consumer’s demand prediction error is $\varsigma=10$ and define grid prices by $\overline{\Pi}=20$ and $\underline{\Pi}=0$ , respectively. With these quantities, we apply Algorithm 4.1 to a state space grid $G$ containing $501$ points equally distributed on the line connecting the points $(1,-15)$ and $(1,15)$ . Furthermore, we discretize the Normal distribution in the disturbance matrices in terms of $10000$ equidistant quantiles. The solution diagnostics in Table 1 is generated by Algorithm 4.2 with $100$ sample paths for the price and $100$ subsimulations for each path at each non-terminal decision epoch. All results are obtained using authors’ R package [19]. We obtained tight bounds and low standard errors, which certify that our results are sufficiently close to the true solution. Figure 4 illustrates the policy values and the decision structure of our nearly-optimal policy.

Finally, let us discuss a typical economic application. Having solved the problem of optimal operational management, the questions of investment and capacity allocation can be addressed. Here, the first step is an estimation of the random revenue profile, followed in the second step by a detailed risk analysis from potential side effects of operational flexibility on the entire portfolio of existing physical and financial assets. Thereby, the hedging value of flexibility (see [12]) is essential. In our study, we illustrate this first step using Monte-Carlo simulations. Having calculated the revenue of our approximately optimal strategy on 10000 randomly generated path scenarios, its density histogram is plotted in the left plot of Figure 5 for different starting storage levels. In line with our expectations, we observe in this figure that a higher initial battery level yields a higher cumulated reward. This is also seen from the left plot of the Figure 4 which shows that the value function at any price is increasing in storage level (which correspond to different curves). The right graph of Figure 4 indicates that a higher initial storage level also yields a lower safety margin in the optimal strategy at the initial time.

Remark:111The authors thank an anonymous referee for pointing to this analysis. The right plot in Figure 5 shows the dependence of the cumulated rewards on the battery size. In this graph, we gradually increase the capacity from 5 MWh to 100MWh and determine the value function for an initially empty storage (for $\phi=0.9$ , $z^{(2)}_{0}=0$ ). The concave curve depicted in this graph shows that the value grows with the capacity at a rate which is steadily decreasing. Such insight may be very valuable in practice. For instance, the optimal size for battery deployment would typically result from equating the marginal value of the storage to its marginal cost. Notice however, that the true practical value of such optimization usually stems from risk hedging effects within agent’s energy generation portfolio, whose analysis now becomes possible, using a reliable strategy optimization provided by our concepts.

6.1 Optimal Trading and Storage

In this section, we discuss the impact of storage facility on optimal energy trading. Recall that in our model, the agent indirectly controls the storage level in terms of energy trading. The point is that the storage absorbs some (if not all) of any unexpected demand which adds some flexibility in the price choice when energy is purchased. To this end, we investigate the impact of battery size, comparing a small (5MWh) against large (100MWh) storage.

In our state dynamics (5.1), the parameter $\phi$ controls the speed of mean reversion for the energy price. Thereby, the lower levels of $\phi$ lead to stronger mean reversion with more frequent return to the seasonal price component of the price. Table 2 compares the expected cumulated rewards for a small battery with 5 MWh capacity against a large battery of size 100 MWh under the assumptions that both batteries are initially empty. This table shows that there is a very substantial benefit of extra storage capacity for all levels of mean reversion. However, mean reversion seems to have very little impact on the expected cumulated rewards. Further, Figure 6 illustrates the role of storage capacity for energy purchase. Here, we depict the difference between the safety margins optimally entered at time $t=0$ for empty storage. We observe that having a large battery capacity allows to buy more energy in advance. Note also that this effect decreases with increasing price. Thereby, the impact of mean reversion parameter is low again.

Finally, we examine in Figure 7 the averaged behavior (on $10000$ scenarios) of safety margins in dependence on storage capacity for different mean reversion parameters. In line with the previous observations, the speed of mean reversion seems to have only little impact. Remarkably, there is a clear seasonal pattern in the difference of optimal safety margins. This is caused by the seasonal nature of prices as shown in Figure 3. Namely, for large storage capacity, the optimal trading follows price seasonality stronger than if the storage is small. This issue is also obvious from Figure 8 which shows that for large capacity it is optimal to keep a certain intermediate level whereas small storage must be filled right at the beginning to hedge against unexpected demand fluctuations. We also observe that close to maturity there is an attempt to fill the storage in order to benefit from price at the end of the time period.

Remark:222The authors thank an anonymous referee for raising this issue. In our finite horizon setting, three phases are observable from Figure 8. After an initial charge to an ”optimal intermediate level”, the battery is used to absorb unexpected demand/supply fluctuations while remaining close to this level. However, closer to maturity, the storage is further filled to take advantage of the price at the end of the time horizon. On this account, it is important to compare the portion of the cumulative reward which results from intermediate balancing to that earned from selling energy at maturity. To investigate this problem, alter the scrap value definition (5.6) to

[TABLE]

With this change, any energy remaining at the end is worthless. Table 3 compares the expected cumulative rewards of the original (5.6) problem to that with (6.2) for different storage capacities. It is not surprising that a larger battery allows exploiting the remaining energy to a greater extent. However comparing both columns in this table, we observe that a very significant part of the battery value results from the energy balancing.

7 Future Research and Conclusion

Electrical storages have the potential to essentially change the nature of electricity trading and may have profound impact on energy price dynamics. This paper provides quantitative concepts to better understand and analyze this development. We demonstrate that using our algorithmic approach to battery storage management, a detailed and accurate strategy optimization is possible. Further details, such as modelling uncertainties in grid prices, costs of deep discharge affecting battery’s life time, and stochastic futures price dynamics can be incorporated. The authors will address these exciting topics in future research.

8 References

Appendix A R Script for Table 1

The following code was used for Table 1. On Linux Ubuntu 16.04 with Intel i5-5300U CPU @2.30GHz and 16GB of RAM, the script below takes less than 20 seconds to run and requires the installation of the ’rcss’ package [19].

The first part of the script returns the value function approximations using Algorithm 4.1. It takes roughly 5 seconds to run.

⬇

1## Remove existing objects and load R package

2rm(list = ls()); gc(); library(rcss)

3## Grid

4grid <- cbind(rep(1, 501), seq(-15, 15, length = 501))

5## Battery

6battery <- seq(0, 100, by = 5)

7## Standard deviation for the consumer demand

8std <- 10

9## Safety margins

10safety <- seq(0, 50, length = 11) ## safety

11## Transition probabilities for controlled Markov chain

12control <- array(data = 0, dim = c(21, 11, 21))

13for (p in 1:21) {

14 for (a in 1:11) {

15 temp <- battery[p] + safety[a] ## center of normal distribution

16 control[p,a,1] <- pnorm(0 + 5/2, temp, std)

17 control[p,a,21] <- 1 - pnorm(100 - 5/2, temp, std)

18 for (pp in 2:(21-1)) {

19 control[p,a,pp] <- pnorm(battery[pp] + 5/2, temp, std) -

20 pnorm(battery[pp] - 5/2, temp, std)

21 }

22 }

23}

24## Functions to calculate expected excess and shortage energy demand

25erf <- function(x){ ## error function

26 return(2 * pnorm(x * sqrt(2)) - 1)

27}

28Excess <- function(pos, act) {

29 temp1 <- 100 + 5/2

30 temp2 <- pos + act

31 result <- std/sqrt(2pi) * exp(-(temp1-temp2)^2/(2std^2)) +

32 (temp2 - 100)/2 * (1 - erf(1/sqrt(2*std^2) * (temp1 - temp2)))

33 return(result)

34}

35Shortage <- function(pos, act) {

36 temp1 <- 0 - 5/2

37 temp2 <- pos + act

38 result <- std/sqrt(2pi) * exp(-(temp1-temp2)^2/(2std^2)) +

39 (0 - temp2)/2 * (erf(1/sqrt(2*std^2) * (temp1 - temp2)) + 1)

40 return(result)

41}

42## Expected excess and shortage energy demand

43excess <- matrix(data = NA, nrow = 21, ncol = 11)

44shortage <- matrix(data = NA, nrow = 21, ncol = 11)

45for (p in 1:21) {

46 for (a in 1:11) {

47 excess[p,a] <- Excess(battery[p], safety[a])

48 shortage[p,a] <- Shortage(battery[p], safety[a])

49 }

50}

51## Subgradient representation of reward functions

52u_t <- 10 + cos((0:335) * 2pi/48 + 3pi/2)

53v_t <- 1 + (sin((0:335) * 2pi/48 + 3pi/2))/2

54reward <- array(0, dim = c(501, 2, 21, 11, 336))

55for (p in 1:21) {

56 for (a in 1:11) {

57 for (t in 1:335) {

58 reward[,1,p,a,t] <- -safety[a] * u_t[t] - shortage[p, a] * 20

59 reward[,2,p,a,t] <- -safety[a] * v_t[t]

60 }

61 }

62 ## Scrap reward

63 reward[,1,p,,336] <- battery[p] * u_t[336]

64 reward[,2,p,,336] <- battery[p] * v_t[336]

65}

66## Parameters for AR(1) process (Z_t)

67mu <- 0

68sigma <- 0.5

69phi <- 0.9

70## Disturbances (W_t)

71disturb_weight <- rep(1 / 10000, 10000) ## probability weights

72disturb <- array(matrix(c(1, 0, 0, phi), ncol = 2, byrow = TRUE), dim = c(2, 2, 10000))

73quantile <- qnorm(seq(0, 1, length = (10000 + 2))[c(-1, -(10000 + 2))])

74disturb[2, 1,] <- mu + sigma * quantile

75r_index <- matrix(c(2, 1), ncol = 2) ## randomness index

76## Fast bellman recursion

77bellman <- FastBellman(grid, reward, control, disturb, disturb_weight, r_index)

The second part of the script computes the lower and upper bound estimates according to Algorithm 4.2. It takes roughly 10 seconds to run.

⬇

1## Exact reward function

2Reward <- function(state, time) {

3 output <- array(0, dim = c(nrow(state), 21 * 11))

4 if (time == 336) {

5 for (p in 1:21) {

6 output[,(p-1) * 11 + (1:11)] <-

7 battery[p] * (u_t[time] + v_t[time] * state[,2])

8 }

9 return(output)

10 }

11 for (p in 1:21) {

12 for (a in 1:11) {

13 output[,(p-1) * 11 + a] <-

14 -safety[a] * (u_t[time] + v_t[time] * state[,2]) -

15 shortage[p,a] * 20

16 }

17 }

18 return(output)

19}

20## Generate sample path disturbances

21set.seed(12345)

22path_disturb <- array(matrix(c(1, 0, 0, phi), ncol = 2, byrow = TRUE),

23 dim = c(2, 2, 335, 100))

24rand <- rnorm(335 * 100 / 2)

25rand <- c(rand, -rand)

26path_disturb[2, 1,,] <- mu + sigma * rand

27## Specifying subsimulation disturbances

28subsim_weight <- rep(1 / 100, 100)

29subsim_disturb <- array(matrix(c(1, 0, 0, phi), ncol = 2, byrow = TRUE),

30 dim = c(2, 2, 100, 100, 335))

31rand <- rnorm(100 * 100 * 335 / 2)

32rand <- as.vector(rbind(rand, -rand))

33subsim_disturb[2, 1,,,] <- mu + sigma * rand

34## Generate sample paths for uncontrolled process

35start <- c(1, 0) ## z_0

36path <- Path(start, path_disturb)

37path_nn <- Neighbour(matrix(path, ncol = 2), grid, 1, "kdtree", 0, 1)$indices

38## Candidate policy for sample paths

39path_action <- PathPolicy(path, path_nn, control, Reward, bellman$expected, grid)

40## Computing martingale increments

41time2 <- proc.time()

42mart <- FastMartingale(bellman$value, path, path_nn, subsim_disturb,

43 subsim_weight, grid, control = control)

44## Calculating the primal and dual values

45duality <- Duality(path, control, Reward, mart, path_action)

46time2 <- proc.time() - time2

47## Printing the results

48diagnostics <- matrix(data = NA, nrow = 21, ncol = 4)

49for (p in 1:21) {

50 diagnostics[p, 1] <- mean(duality$primal[1, p,])

51 diagnostics[p, 2] <- sd(duality$primal[1, p,])/sqrt(100)

52 diagnostics[p, 3] <- mean(duality$dual[1, p,])

53 diagnostics[p, 4] <- sd(duality$dual[1, p,])/sqrt(100)

54}

55print(round(diagnostics,3))

Bibliography33

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] I. Bakke, S. Fleten, L. Hagfors, V. Hagspiel, B. Norheim, and S. Wogrin. Investment in electric energy storage under uncertainty: a real options approach. Computational Management Science , 13(3):483–500, 2016.
2[2] N. Bäuerle and U. Rieder. Markov Decision Processes with Applications to Finance . Springer, Heidelberg, 2011.
3[3] M. Beaudin, H. Zareipour, A. Schellenberglabe, and W. Rosehart. Energy storage for mitigating the variability of renewable electricity sources: An updated review. Energy for Sustainable Development , 14(4):302 – 314, 2010.
4[4] F. Benth, A. Cartea, and R. Kiesel. Pricing forward contracts in power markets by the certainty equivalence principle: Explaining the sign of the market risk premium. Journal of Banking & Finance , 32(10):2006 – 2021, 2008.
5[5] K. Bradbury, L. Pratson, and D. Patiño-Echeverri. Economic viability of energy storage systems based on price arbitrage potential in real-time u.s. electricity markets. Applied Energy , 114:512 – 519, 2014.
6[6] S. Breton and G. Moe. Status, plans and technologies for offshore wind turbines in europe and north america. Renewable Energy , 34(3):646 – 654, 2009.
7[7] D. B. Brown, J. E. Smith, and P. Sun. Information relaxations and duality in stochastic dynamic programs. Operations Research , 57(10):785–851, 2011.
8[8] A. Cartea and P. Villaplana. Spot price modeling and the valuation of electricity forward contracts: The role of demand and capacity. Journal of Banking & Finance , 32(12):2502 – 2519, 2008.