Designing reactive power control rules for smart inverters using support   vector machines

Mana Jalali; Vassilis Kekatos; Nikolaos Gatsis; Deepjyoti Deka

arXiv:1903.01016·math.OC·September 26, 2019·IEEE Trans. Smart Grid

Designing reactive power control rules for smart inverters using support vector machines

Mana Jalali, Vassilis Kekatos, Nikolaos Gatsis, Deepjyoti Deka

PDF

TL;DR

This paper introduces a machine learning approach using support vector machines to design nonlinear inverter control rules for voltage regulation, aiming to improve upon preset local rules with reduced communication overhead.

Contribution

It formulates inverter control rule design as a multi-task learning problem, enabling customized, nonlinear control rules that enhance voltage regulation in distribution grids.

Findings

01

Nonlinear control rules outperform preset local rules.

02

The approach reduces communication overhead.

03

Trade-offs between voltage regulation and rule sparsity are demonstrated.

Abstract

Smart inverters have been advocated as a fast-responding mechanism for voltage regulation in distribution grids. Nevertheless, optimal inverter coordination can be computationally demanding, and preset local control rules are known to be subpar. Leveraging tools from machine learning, the design of customized inverter control rules is posed here as a multi-task learning problem. Each inverter control rule is modeled as a possibly nonlinear function of local and/or remote control inputs. Given the electric coupling, the function outputs interact to yield the feeder voltage profile. Using an approximate grid model, inverter rules are designed jointly to minimize a voltage deviation objective based on anticipated load and solar generation scenarios. Each control rule is described by a set of coefficients, one for each training scenario. To reduce the communication overhead between the grid…

Tables1

Table 1. TABLE I: Running Time for Solving ( 16 ) with T = 30 𝑇 30 T=30

	C4)	C5)	C6)	C7)
Running time [min]	$0.21$	$0.45$	$0.96$	$1.99$

Equations110

p = p^{g} - p^{c} and q = q^{g} - q^{c} .

p = p^{g} - p^{c} and q = q^{g} - q^{c} .

∣ q_{n}^{g} ∣ \leq \overset{q}{ˉ}_{n}^{g} := (\overset{s}{ˉ}_{n}^{g})^{2} - (p_{n}^{g})^{2}

∣ q_{n}^{g} ∣ \leq \overset{q}{ˉ}_{n}^{g} := (\overset{s}{ˉ}_{n}^{g})^{2} - (p_{n}^{g})^{2}

v ≃ Rp + Xq + v_{0} 1

v ≃ Rp + Xq + v_{0} 1

v - v_{0} 1 = X q^{g} + y

v - v_{0} 1 = X q^{g} + y

\tilde{q}^{g} := ar g q^{g} \in Q min

\tilde{q}^{g} := ar g q^{g} \in Q min

Δ_{s} (q^{g}; y) := n = 1 \sum N (v_{n} - v_{0})^{2} = ∥ X q^{g} + y ∥_{2}^{2} .

Δ_{s} (q^{g}; y) := n = 1 \sum N (v_{n} - v_{0})^{2} = ∥ X q^{g} + y ∥_{2}^{2} .

Δ_{ϵ} (q^{g}; y) := n = 1 \sum N [v_{n} - v_{0}]_{ϵ} = n = 1 \sum N [e_{n}^{⊤} (X q^{g} + y)]_{ϵ}

Δ_{ϵ} (q^{g}; y) := n = 1 \sum N [v_{n} - v_{0}]_{ϵ} = n = 1 \sum N [e_{n}^{⊤} (X q^{g} + y)]_{ϵ}

[x]_{\epsilon}:=\left\{\begin{array}[]{ll}0&,~{}|x|\leq\epsilon\\ |x|-\epsilon&,~{}\text{otherwise}\end{array}\right..

[x]_{\epsilon}:=\left\{\begin{array}[]{ll}0&,~{}|x|\leq\epsilon\\ |x|-\epsilon&,~{}\text{otherwise}\end{array}\right..

H_{K} := {f (z) = s = 1 \sum \infty K (z, z_{s}) a_{s}, a_{s} \in R} .

H_{K} := {f (z) = s = 1 \sum \infty K (z, z_{s}) a_{s}, a_{s} \in R} .

f \in H_{K}, b min \frac{1}{S} s = 1 \sum S L (f (z_{s}), b; y_{s}) + μ ∥ f ∥_{K}

f \in H_{K}, b min \frac{1}{S} s = 1 \sum S L (f (z_{s}), b; y_{s}) + μ ∥ f ∥_{K}

f (z) = s = 1 \sum S K (z, z_{s}) a_{s} .

f (z) = s = 1 \sum S K (z, z_{s}) a_{s} .

a, b min \frac{1}{S} ∥ y - Ka - b 1 ∥_{2}^{2} + μ ∥ K^{1/2} a ∥_{2}

a, b min \frac{1}{S} ∥ y - Ka - b 1 ∥_{2}^{2} + μ ∥ K^{1/2} a ∥_{2}

q_{n}^{g} (z_{n}) = f_{n} (z_{n}) + b_{n}

q_{n}^{g} (z_{n}) = f_{n} (z_{n}) + b_{n}

z_{n} := [\overset{q}{ˉ}_{n}^{g} (p_{n}^{c} - p_{n}^{g}) q_{n}^{c}]^{⊤}

z_{n} := [\overset{q}{ˉ}_{n}^{g} (p_{n}^{c} - p_{n}^{g}) q_{n}^{c}]^{⊤}

H_{K_{n}} := {f_{n} (z_{n}) = s = 1 \sum \infty K_{n} (z_{n}, z_{n, s}) a_{n, s}, a_{n, s} \in R}

H_{K_{n}} := {f_{n} (z_{n}) = s = 1 \sum \infty K_{n} (z_{n}, z_{n, s}) a_{n, s}, a_{n, s} \in R}

min

min

over

{f_{n} \in H_{K_{n}}}, b := [b_{1} \dots b_{N}]^{⊤}

s.to

f_{n} (z_{n}) = s = 1 \sum S K_{n} (z_{n}, z_{n, s}) a_{n, s} .

f_{n} (z_{n}) = s = 1 \sum S K_{n} (z_{n}, z_{n, s}) a_{n, s} .

f_{n} = K_{n} a_{n}, \forall n

f_{n} = K_{n} a_{n}, \forall n

∥ f_{n} ∥_{K_{n}} = a_{n}^{⊤} K_{n} a_{n}, \forall n .

∥ f_{n} ∥_{K_{n}} = a_{n}^{⊤} K_{n} a_{n}, \forall n .

f_{n} (z_{n}) = z_{n}^{⊤} w_{n}, \forall n .

f_{n} (z_{n}) = z_{n}^{⊤} w_{n}, \forall n .

q_{n} (z_{n, s}) = f_{n} (z_{n, s}) + b_{n} = z_{n, s}^{⊤} Z_{n} a_{n} + b_{n} .

q_{n} (z_{n, s}) = f_{n} (z_{n, s}) + b_{n} = z_{n, s}^{⊤} Z_{n} a_{n} + b_{n} .

f_{n} (z_{n}) = ϕ_{n}^{⊤} w_{n}

f_{n} (z_{n}) = ϕ_{n}^{⊤} w_{n}

P_{\overset{q}{ˉ}_{n, t}^{g}} [q_{n, t}^{g}] := max {min {q_{n, t}^{g}, \overset{q}{ˉ}_{n, t}^{g}}, - \overset{q}{ˉ}_{n, t}^{g}} .

P_{\overset{q}{ˉ}_{n, t}^{g}} [q_{n, t}^{g}] := max {min {q_{n, t}^{g}, \overset{q}{ˉ}_{n, t}^{g}}, - \overset{q}{ˉ}_{n, t}^{g}} .

P_{\overset{q}{ˉ}_{n, t^{'}}^{g}} [s = 1 \sum S K_{n} (z_{n, t^{'}}, z_{n, s}) a_{n, s} + b_{n}] .

P_{\overset{q}{ˉ}_{n, t^{'}}^{g}} [s = 1 \sum S K_{n} (z_{n, t^{'}}, z_{n, s}) a_{n, s} + b_{n}] .

- \overset{ˉ}{q}_{n}^{g} \leq K_{n} a_{n} + b_{n} 1 \leq \overset{ˉ}{q}_{n}^{g}, \forall n

- \overset{ˉ}{q}_{n}^{g} \leq K_{n} a_{n} + b_{n} 1 \leq \overset{ˉ}{q}_{n}^{g}, \forall n

X q_{s}^{g} + y_{s}

X q_{s}^{g} + y_{s}

= n = 1 \sum N x_{n} e_{s}^{⊤} K_{n} a_{n} + n = 1 \sum N b_{n} x_{n} + y_{s}

Δ_{τ} (q^{g}; y) := [∥ X q^{g} + y ∥_{2}]_{τ}

Δ_{τ} (q^{g}; y) := [∥ X q^{g} + y ∥_{2}]_{τ}

Δ_{τ} (q^{g}; y) := d \geq 0 min {d : ∥ X q^{g} + y ∥_{2} \leq d + τ} .

Δ_{τ} (q^{g}; y) := d \geq 0 min {d : ∥ X q^{g} + y ∥_{2} \leq d + τ} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Designing Reactive Power Control Rules

for Smart Inverters using Support Vector Machines

Mana Jalali, , Vassilis Kekatos, , Nikolaos Gatsis, , and Deepjyoti Deka Manuscript received March 1, 2019; revised June 4, 2019, and August 15, 2019; accepted September 16, 2019. Date of publication DATE; date of current version DATE. Paper no. TSG.00318.2019. This work was supported in part by the NSF-CAREER grant 1751085.M. Jalali and V. Kekatos are with the Bradley Dept. of ECE, Virginia Tech, Blacksburg, VA 24061, USA. N. Gatsis is with the ECE Dept., University of Texas at San Antonio, San Antonio, TX 78249, USA. D. Deka is with the Theoretical Division at Los Alamos National Laboratory, NM 87545, USA. Emails: {manaj2,kekatos}@vt.edu; [email protected]; [email protected] versions of one or more of the figures is this paper are available online at http://ieeexplore.ieee.org.Digital Object Identifier XXXXXX

Abstract

Smart inverters have been advocated as a fast-responding mechanism for voltage regulation in distribution grids. Nevertheless, optimal inverter coordination can be computationally demanding, and preset local control rules are known to be subpar. Leveraging tools from machine learning, the design of customized inverter control rules is posed here as a multi-task learning problem. Each inverter control rule is modeled as a possibly nonlinear function of local and/or remote control inputs. Given the electric coupling, the function outputs interact to yield the feeder voltage profile. Using an approximate grid model, inverter rules are designed jointly to minimize a voltage deviation objective based on anticipated load and solar generation scenarios. Each control rule is described by a set of coefficients, one for each training scenario. To reduce the communication overhead between the grid operator and the inverters, we devise a voltage regulation objective that is shown to promote parsimonious descriptions for inverter control rules. Numerical tests using real-world data on a benchmark feeder demonstrate the advantages of the novel nonlinear rules and explore the trade-off between voltage regulation and sparsity in rule descriptions.

Index Terms:

Support vector machines; multi-kernel learning; voltage regulation; linearized distribution flow model.

I Introduction

Several electric utilities in the US currently experience issues while integrating residential- and commercial-scale solar generation. A solar farm connected at the end of a long rural feeder can incur voltage excursions along the feeder, while frequent power flow reversals strain the apparent power capabilities of substation transformers [1]. Solar generation from residential photovoltaics (PVs) can fluctuate by up to 15% of their rating within one-minute intervals [2]. Utility-owned voltage control equipment, such as load-tap-changing transformers, capacitor banks, and step-voltage regulators, involves discrete control actions, and its lifespan is related to the number of switching operations [3]. Regulating voltage under increasing renewable generation may require more frequent switching and further installations, thus critically challenging reactive power control in distribution grids.

On the other hand, PVs are interfaced by inverters featuring advanced communication, metering, and control functionalities. Using inverters for reactive power control has been advocated as a fast-responding solution [1]. The amended IEEE 1547 standard allows inverters to be operating at non-unit power factors [4]. Nonetheless, coordinating in real-time hundreds of inverters distributed over a feeder is a formidable task. In a typical setup, the values of instantaneous loads and solar generation are communicated to a utility controller; the controller minimizes ohmic losses subject to voltage regulation constraints; and the computed setpoints are sent back to inverters. The problem of finding the optimal reactive injection setpoints for inverters is an instance of the optimal power flow (OPF) task, which is non-convex in general. Different convex relaxations have been proposed; see [5] for a survey. The uncertainty in loads and solar generation over the next control period is usually accounted for through stochastic and robust formulations [6], [7]. To reduce complexity, approximate grid models have also been employed [8], [9]; though heavy two-way utility-inverter communication is still needed.

Alternatively, decentralized solvers where inverters decide their setpoints upon communicating with neighboring inverters have been devised [10], [11], [12]. On the other extreme, localized schemes suggest having inverters implementing Volt-VAR and/or Watt-VAR curves given only local measurements [1]. Although such rules have been analytically shown to be stable and fast-converging, their equilibria unfortunately do not coincide with the sought OPF minimizers [13], [8], [14], [15]. In fact, there exist cases where local rules perform worse than the no-reactive support option [16].

The previous literature review indicates that centralized schemes incur high computational complexity; decentralized solvers require multiple communication exchanges among inverters; and local schemes have no performance guarantees. As a middle-ground solution, inverter setpoints can be designed in a quasi-static fashion via control rules. A rule expresses each setpoint as an affine function of given inputs, such as generation, load, or voltage. Albeit the related weights are optimized periodically in a centralized fashion, control rules are applied in real time. Controlling inverters via affine rules has been accomplished using chance-constrained [17]; robust [16], [18]; and closed-loop formulations [19]. Optimal rules however are not necessarily linear: If an apparent power constraint becomes active, reactive injections can become nonlinear functions of solar generation. To capture this nonlinearity, recent approaches engage learning models which are trained to optimize: Given pairs of grid conditions (load and solar generation) and their optimal inverter dispatches computed, the aforesaid approaches learn dispatch rules using linear or kernel-based regression [20], [21].

This work combines machine learning tools with physical grid models, and advocate a kernel-based approach for designing inverter control rules. The contribution is on two fronts: First, the design of inverter control rules is posed as a multi-task learning problem. Each inverter rule is modeled as a nonlinear function of control inputs. Rules are coupled through the electric grid to yield a system voltage profile. Using an approximate grid model, inverter rules are learned jointly so that they minimize a voltage regulation cost using anticipated load and solar generation scenarios. Each rule is described by a set of coefficients, one for each scenario. As a second contribution, we engineer the voltage regulation objective, so that the optimal rules are described by a few scenario coefficients. Such parsimonious representation of inverter rules saves communications. Numerical tests on a benchmark feeder showcase the advantages of nonlinear rules and explore the trade-off between voltage regulation and sparse rules.

Regarding notation, lower- (upper-) case boldface letters denote column vectors (matrices), while calligraphic symbols are reserved for sets. Symbol ⊤ stands for transposition and $\|\mathbf{x}\|_{2}$ denotes the $\ell_{2}$ -norm of $\mathbf{x}$ .

II Reactive Power Control

This section formulates the task of voltage regulation using inverters. Consider a distribution grid having $N+1$ buses served by the substation indexed by $n=0$ . Let $v_{n}$ denote the voltage magnitude, and $p_{n}+jq_{n}$ the complex power injection at bus $n$ . The active injection $p_{n}$ is decomposed into $p_{n}=p_{n}^{g}-p_{n}^{c}$ , where $p_{n}^{g}$ is the solar generation and $p_{n}^{c}$ the inelastic load at bus $n$ . Reactive injections can be similarly expressed as $q_{n}=q_{n}^{g}-q_{n}^{c}$ . Collect injections in $N$ -length vectors:

[TABLE]

The reactive power injected by inverter $n$ is constrained as

[TABLE]

where $\bar{s}_{n}^{g}$ is the apparent power limit for inverter $n$ ; see [1].

Given loads $(\mathbf{p}^{c},\mathbf{q}^{c})$ and solar generation $\mathbf{p}^{g}$ , voltage regulation aims at optimally setting $\mathbf{q}^{g}$ such that voltage deviations are kept minimal. To formally describe this task, one has to deal with the nonlinear power flow equations relating voltages to power injections. Trading modeling accuracy for computational tractability, we resort to the linearized model [22]

[TABLE]

where $\mathbf{v}:=[v_{1}~{}\ldots~{}v_{N}]^{\top}$ and matrices $(\mathbf{R},\mathbf{X})$ depend on the feeder. Model (3) can be derived by linearizing the power flow equations around the flat voltage profile. In fact, the linearization can be performed at any system state $\mathbf{v}_{0}$ , yet matrices $\mathbf{R}$ and $\mathbf{X}$ would then depend on the state $\mathbf{v}_{0}$ ; see [19]. From (1) and (3), the vector of voltage deviations from its nominal value can be approximated as

[TABLE]

where $\mathbf{y}:=\mathbf{R}(\mathbf{p}^{g}-\mathbf{p}^{c})-\mathbf{X}\mathbf{q}^{c}$ and $\mathbf{1}$ is a vector of all ones.

The goal here is to design the inverter injections $\mathbf{q}^{g}$ so that bus voltage magnitudes remain within regulation limits. The ANSI-C.84.1 standard dictates that service (load) voltages should remain within $\pm 5\%$ per unit (pu). However, our grid model of (4) stops at the level of distribution transformers. A distribution (pole or pad-mounted) transformer may be serving several residential customers. Each customer is typically connected to the distribution transformer through a triplex cable, which incurs a voltage drop between the transformer and the service voltage: Suppose a customer is connected to a 50 kVA, 7200-240/120 V center-tapped transformer via a 1/0 AA 100-ft triplex cable. The customer runs a constant-current load of 10 kVA at the nominal voltage of 120 V with 0.9 lagging power factor. If load currents are equally distributed among the three supplies (two 120 V and one 240 V), the service voltage drops by $1.5\%$ pu. If the load is distributed among supplies non-uniformly, the service voltage can drop by even $3.5\%$ pu. Due to this, the current practice is to maintain voltages at distribution transformers within $\pm 3\%$ pu, to ensure that service voltages remain within $\pm 5\%$ pu; see exercises of [23].

Given loads, solar generation, and grid parameters, the goal is to decide $\mathbf{q}^{g}$ to regulate voltage while satisfying the apparent power constraints of (2). The setpoints for reactive power injections from inverters can be found as the minimizer

[TABLE]

The set $\mathcal{Q}\subseteq\mathbb{R}^{N}$ captures the constraints in (2) for all $n$ ; and $\Delta(\mathbf{q}^{g};\mathbf{y})$ is a voltage regulation objective. A typical choice for $\Delta$ is the sum of squared voltage deviations [13], [16], [19]

[TABLE]

Alternatively, the utility may want to maintain voltages within the range of $(1\pm\epsilon)v_{0}$ for say $\epsilon=0.03$ . Then, a pertinent objective is [14]

[TABLE]

where $\mathbf{e}_{n}$ is the $n$ -th canonical vector of length $N$ , and the operator $[\cdot]_{\epsilon}$ is defined as

[TABLE]

Function $\Delta_{\epsilon}$ returns zero when all voltages are within limits. Otherwise, it increases linearly with voltage excursions; see [14] for distributed solvers of (5) with $\Delta=\Delta_{\epsilon}$ .

It is worth noticing that $\mathbf{X}$ depends only on the network and the linearization point, whereas the set $\mathcal{Q}$ and vector $\mathbf{y}$ depend on the variable loads and solar generation, collectively denoted as $\boldsymbol{\chi}:=[(\mathbf{p}^{c})^{\top}~{}(\mathbf{q}^{c})^{\top}~{}(\mathbf{p}^{g})^{\top}]^{\top}$ .

Ideally, the reactive control process entails three steps:

S1)

Each bus communicates its $(p_{n}^{g},p_{n}^{c},q_{n}^{c})$ to the operator. 2. S2)

The operator solves (5) knowing the current $\boldsymbol{\chi}$ . 3. S3)

The operator sends the optimal setpoints $\tilde{\mathbf{q}}^{g}$ to inverters.

Under variable solar generation, the process has to be repeated on a per-minute basis. Observe that S1) establishes $N$ inverter-to-utility communication links, and S3) requires another $N$ utility-to-inverter links. Running this process for multiple feeders can become a computationally and communication-wise challenging task.

To adaptively adjust inverter setpoints based on $\boldsymbol{\chi}_{t}$ , affine control rules in the form of $\mathbf{q}^{g}(\boldsymbol{\chi}_{t})$ have been suggested in [16], [17], [18]. Based on these rules, the reactive injection of inverter $n$ is expressed as an affine function over a subvector of $\boldsymbol{\chi}_{t}$ . The premise is to design the rule in a quasi-stationary fashion, but apply it in real-time. We extend linear to nonlinear control rules enjoying varying cyber requirements after briefly reviewing the toolbox of kernel-based learning.

III Preliminaries on Kernel-based Learning

Given pairs $\{(z_{s},y_{s})\}_{s=1}^{S}$ of features $z_{s}$ belonging to a measurable space $\mathcal{Z}$ and target values $y_{s}\in\mathbb{R}$ , kernel-based learning aims at finding a function or mapping $f:\mathcal{Z}\rightarrow\mathbb{R}$ . From all possible options of arbitrarily complex functions, one needs to select a specific family where $f$ belongs. Kernel-based learning postulates that $f$ lies in the function space [24]

[TABLE]

This is the space of functions that can be expressed as linear combinations of a given kernel (basis) function $K:\mathcal{Z}\times\mathcal{Z}\rightarrow\mathbb{R}$ evaluated at arbitrary points $z_{s}$ . When $K(\cdot,\cdot)$ is a symmetric positive definite function, then $\mathcal{H}_{\mathcal{K}}$ becomes a reproducing kernel Hilbert space (RKHS) whose members have finite norm $\|f\|_{\mathcal{K}}^{2}:=\sum_{s=1}^{\infty}\sum_{s^{\prime}=1}^{\infty}K(z_{s},z_{s^{\prime}})a_{s}a_{s^{\prime}}$ ; see [25]. Some options for the kernel function $K$ are provided under Examples 1–2 in Section IV-A.

Learning $f$ from data $\{(z_{s},y_{s})\}_{s=1}^{S}$ can be formulated as the regularization task [24], [26]

[TABLE]

where $b$ is an intercept term. When it comes to regression, typical choices for the data-fitting loss $L$ include the least-squares (LS) fit $\left(y_{s}-f(z_{s})-b\right)^{2}$ , or the $\epsilon$ -insensitive loss $\left[y_{s}-f(z_{s})-b\right]_{\epsilon}$ . The second term in (10) ensures $f\in\mathcal{H}_{\mathcal{K}}$ and facilitates generalization over unseen data [25]. Parameter $\mu>0$ balances fitting versus generalization, and is tuned via cross-validation: i) problem (10) is solved for a specific $\mu$ using $4/5$ of the data; ii) the learned function is validated on the unused $1/5$ of the data; iii) the process is repeated $5$ times to calculate the average fitting error for this $\mu$ ; and iv) the $\mu$ attaining the best fit is selected; see [24] for details.

The advantage of confining $f$ to lie in the RKHS $\mathcal{H}_{\mathcal{K}}$ is that the functional optimization of (10) can be equivalently posed as an minimization problem over a finite-dimensional vector: The celebrated Representer’s Theorem asserts that the solution to (10) admits the form [24]

[TABLE]

In other words, the minimizer of (10) is described only by $S$ rather than infinitely many $a_{s}$ ’s. Based on (11), evaluating $f(z)$ at the given data provides $\mathbf{f}=\mathbf{K}\mathbf{a}$ , where $\mathbf{f}:=[f(z_{1})~{}\ldots~{}f(z_{S})]^{\top}$ ; matrix $\mathbf{K}\in\mathbb{S}_{++}^{S}$ is the kernel matrix with entries $[\mathbf{K}]_{s,s^{\prime}}:=K(z_{s},z_{s^{\prime}})$ ; and $\mathbf{a}:=[a_{1}~{}\ldots~{}a_{S}]^{\top}$ .

From properties of the RKHS’s, it holds that $\|f\|_{\mathcal{K}}^{2}=\mathbf{a}^{\top}\mathbf{K}\mathbf{a}$ ; see [25]. For regression under an LS loss, the functional minimization in (10) becomes the vector optimization

[TABLE]

where $\mathbf{K}^{1/2}$ is the square root of $\mathbf{K}$ and $\mathbf{y}:=[y_{1}~{}\cdots~{}y_{S}]^{\top}$ .

It is worth stressing that (11) applies not only to the given data $\{z_{s}\}_{s=1}^{S}$ , but any $z_{s^{\prime}}\in\mathcal{Z}$ . Evaluating $f(z)$ requires knowing the $(\mathbf{a},b)$ minimizing (12), and being able to evaluate the kernel $K(z,z_{s})$ for $s=1,\ldots,S$ . We next use kernel-based learning to develop nonlinear inverter control rules.

IV Kernel-based Control Policies

The reactive injection by inverter $n$ is modeled by the rule

[TABLE]

whose ingredients $(f_{n},\mathbf{z}_{n},b_{n})$ are explained next.

Control inputs: Vector $\mathbf{z}_{n}\in\mathcal{Z}_{n}\subseteq\mathbb{R}^{M_{n}}$ is the input to control rule for inverter $n$ . This vector may include load, solar generation, and/or line flow measurements collected locally or remotely. For a purely local rule, this input can be selected as

[TABLE]

where the first entry $\bar{q}_{n}^{g}$ relates to the apparent power constraint and has been defined in (2). The voltage $v_{n}$ could also be appended in $\mathbf{z}_{n}$ ; however the stability of the resultant control loop is hard to analyze even when $f_{n}$ is linear; see e.g., [27], [14], [15], [19], [20].

Selecting the controller structure, i.e., the content for each $\mathbf{z}_{n}$ , can affect critically the performance of this control scheme. Ideally, each inverter rule can be fed all uncertain quantities, that is the three numbers in the right-hand side of (14) across all buses. In that case, the input vectors $\mathbf{z}_{n}$ become all equal and of size $3N$ . However, this incurs the communication burden of broadcasting $3N$ values in real time. Hybrid setups with $\mathbf{z}_{n}$ ’s carrying a combination of local and remote data can be envisioned. To eliminate the effect of this trade-off between communications and performance, this work assumes that the content of $\mathbf{z}_{n}$ ’s is prespecified. The task of input selection could be possibly pursued along the lines of sparse linear or polynomial regression [19], [20], [28]; and automatic relevance determination [29, Sec. 6.4].

Control function: Selecting the form of $f_{n}$ is the second design task. To leverage kernel-based learning, the inverter rule $f_{n}$ is postulated to lie in the RKHS

[TABLE]

determined by the kernel function $K_{n}:\mathcal{Z}_{n}\times\mathcal{Z}_{n}\rightarrow\mathbb{R}$ .

Linear rules can be designed by selecting the linear kernel $K_{n}(\mathbf{z}_{n,s},\mathbf{z}_{n,s^{\prime}})=\mathbf{z}_{n,s}^{\top}\mathbf{z}_{n,s^{\prime}}$ . Nonlinear rules can be designed by selecting for example a polynomial kernel $K_{n}(\mathbf{z}_{n,s},\mathbf{z}_{n,s^{\prime}})=\left(\mathbf{z}_{n,s}^{\top}\mathbf{z}_{n,s^{\prime}}+\gamma\right)^{\beta}$ or a Gaussian kernel $K_{n}(\mathbf{z}_{n,s},\mathbf{z}_{n,s^{\prime}})=\exp\left(-\|\mathbf{z}_{n,s}-\mathbf{z}_{n,s^{\prime}}\|_{2}^{2}/\gamma\right)$ with design parameters $\beta>0$ and $\gamma>0$ ; see [24].

Intercept $b_{n}\in\mathbb{R}$ : Although it could be incorporated into $f_{n}$ by augmenting $\mathbf{z}_{n}$ with a constant entry of $1$ , it is kept separate to avoid its penalization through $\|f\|_{\mathcal{K}_{n}}$ [24].

IV-A Learning rules from scenario data

The rules of (13) can be learned from scenario data indexed by $s\in\mathcal{S}$ with $\mathcal{S}:=\{1,\ldots,S\}$ . Scenario $s$ consists of the control inputs $\mathbf{z}_{n,s}$ for $n\in\mathcal{N}$ , and the associated vector $\mathbf{y}_{s}:=\mathbf{R}(\mathbf{p}^{g}_{s}-\mathbf{p}^{c}_{s})-\mathbf{X}\mathbf{q}^{c}_{s}$ defined in (4). Evaluating rule $n$ of (13) under scenario $s$ yields the inverter response $q_{n,s}^{g}:=q_{n}^{g}(\mathbf{z}_{n,s})$ . Let us collect the outputs $q_{n,s}^{g}$ from all inverters into vector $\mathbf{q}^{g}_{s}$ . Note that the goal is not to fit $\mathbf{y}_{s}$ by $\mathbf{q}^{g}_{s}$ , but to minimize the voltage deviations $\mathbf{X}\mathbf{q}^{g}_{s}+\mathbf{y}_{s}$ . The control functions $\{f_{n}\}_{n=1}^{N}$ and the intercepts $\{b_{n}\}_{n=1}^{N}$ accomplishing this goal can be found via the functional minimization

[TABLE]

where $\Delta$ is a voltage regulation objective [cf. (6)–(7)].

Remark 1.

The proposed approach is related to [20]–[21], where inverter rules are also trained using machine learning. However, the aforementioned works proceed in two steps: They first solve a sequence of OPF problems similar to (5) to find the optimal inverter setpoints $\tilde{\mathbf{q}}^{g}$ under different scenarios. Secondly, they learn the mapping between controller inputs $\{\mathbf{z}_{n,s}\}_{s\in\mathcal{S}}$ and optimal setpoints $\{\tilde{q}_{n,s}^{g}\}$ decided by the OPF problems. During this process, they also select which inputs are more effective to be communicated to inverters. The mapping is learned via linear or kernel-based regression. On the other hand, the approach proposed here consolidates the OPF and the learning steps into a single step: The advantage is that the OPF decisions of (16) are taken under the explicit practical limitation that $q_{n}^{g}$ can only be a function of $\mathbf{z}_{n}$ , since inverter $n$ will not have access to the complete grid conditions. To get some intuition, suppose ones designs linear control rules of known input structure using the single-step approach of (16) with $\mu=0$ and the two-step approach of [20]–[21]. The single-step approach yields rules $R_{1}$ , and the two-step approach yields rules $R_{2}$ . Let us evaluate $R_{1}$ and $R_{2}$ on the training scenarios. Rules $R_{2}$ are not necessarily feasible per scenario $s\in\mathcal{S}$ , whereas rules $R_{1}$ are. Moreover, rules $R_{2}$ do not necessarily coincide with the minimizers of (5). For the sake of comparison, let us assume that rules $R_{2}$ turn out to be feasible per scenario, and hence feasible for (16). Being the minimizers of (16), rules $R_{1}$ attain equal or smaller voltage deviation cost compared to $R_{2}$ over the training data. Numerical tests in Section VI corroborate the advantage of $R_{1}$ over $R_{2}$ for $\mu>0$ and during the operational phase as well.

Different from (10), the optimization in (16) entails learning multiple functions (one per inverter). Since inverter injections affect voltages feeder-wise, inverter rules are naturally coupled through $\Delta$ in (16). Similar multi-function setups can be found in collaborative filtering or multi-task learning [25], [30].

Fortunately, Representer’s Theorem can be applied successively over $n$ in (16). Therefore, each rule $n$ is written as

[TABLE]

Once the coefficients $\{a_{n,s}\}$ have been found, rule $\{f_{n}\}$ can be evaluated for any $\mathbf{z}_{n}$ . Similar to (11), evaluating rule $f_{n}$ over the scenario data $\{\mathbf{z}_{n,s}\}_{s=1}^{S}$ gives

[TABLE]

where $[\mathbf{K}_{n}]_{s,s^{\prime}}=K_{n}(\mathbf{z}_{n,s},\mathbf{z}_{n,s^{\prime}})$ for $s,s^{\prime}=1,\ldots,S$ , and $\mathbf{a}_{n}:=[a_{n,1}~{}\cdots~{}a_{n,S}]^{\top}$ . The RKHS norms can be written as

[TABLE]

In this way, the functional minimization in (16) is cast as a vector minimization over $\{\mathbf{a}_{n}\}_{n=1}^{N}$ and $\mathbf{b}$ . The exact form of this minimization and its properties for different $\Delta$ are discussed later in Section V. For now, let us clarify how the kernel functions $K_{n}(\cdot,\cdot)$ effect different rule forms.

Example 1: Affine rules. The linear kernel $K_{n}(\mathbf{z}_{n,s},\mathbf{z}_{n,s^{\prime}})=\mathbf{z}_{n,s}^{\top}\mathbf{z}_{n,s^{\prime}}$ yields affine rules. The sought functions can be written as

[TABLE]

Given scenario data $\mathbf{z}_{n,s}$ and $\mathbf{y}_{s}$ for $n\in\mathcal{N}$ and $s\in\mathcal{S}$ , we would like to find $\{\mathbf{w}_{n},b_{n}\}_{n}$ through (16). Collect the input data for inverter $n$ in the $M_{n}\times S$ matrix $\mathbf{Z}_{n}:=\left[\mathbf{z}_{n,1}~{}\cdots~{}\mathbf{z}_{n,S}\right]$ . According to Representer’s Theorem, the optimal $\mathbf{w}_{n}$ can be expressed as $\mathbf{w}_{n}=\mathbf{Z}_{n}\mathbf{a}_{n}$ for some $\mathbf{a}_{n}$ . Evaluating the control rule for any input $\mathbf{z}_{n,s}$ yields

[TABLE]

Evaluating the rule at the input data yields (18) with $\mathbf{K}_{n}=\mathbf{Z}_{n}^{\top}\mathbf{Z}_{n}$ . The squared function norm is $\|f_{n}\|_{\mathcal{K}_{n}}^{2}=\|\mathbf{w}_{n}\|_{2}^{2}=\mathbf{a}_{n}^{\top}\mathbf{Z}_{n}^{\top}\mathbf{Z}_{n}\mathbf{a}_{n}=\mathbf{a}_{n}^{\top}\mathbf{K}_{n}\mathbf{a}_{n}$ .

Example 2: Non-linear rules. For non-linear rules, transform the input $\mathbf{z}_{n,s}$ to vector $\boldsymbol{\phi}_{n,s}:=\phi_{n}(\mathbf{z}_{n,s})$ via a non-linear mapping $\phi_{n}:\mathbb{R}^{M_{n}}\rightarrow\mathbb{R}^{\Phi_{n}}$ . The entries of $\boldsymbol{\phi}_{n,s}$ could be for example all the first- and second-order monomials formed by the entries of $\mathbf{z}_{n,s}$ . The dimension $\Phi_{n}$ of $\boldsymbol{\phi}_{n,s}$ can be finite (e.g., polynomial kernels) or infinite (Gaussian kernels) [29]. Then, the control function

[TABLE]

with $\mathbf{w}_{n}\in\mathbb{R}^{\Phi_{n}}$ is non-linear in $\mathbf{z}_{n}$ . The developments of Example 1 carry over to Example 2 by using $\mathbf{K}_{n}=\mathbf{\Phi}_{n}^{\top}\mathbf{\Phi}_{n}$ and replacing $\mathbf{Z}_{n}$ by $\mathbf{\Phi}_{n}:=[\boldsymbol{\phi}_{n,1}~{}\cdots~{}\boldsymbol{\phi}_{n,S}]$ . Depending on the mapping $\phi_{n}$ , the vectors $\boldsymbol{\phi}_{n,s}$ may be of finite or infinite length [24]. The critical point is that $f_{n}$ does not depend on $\boldsymbol{\phi}_{n,s}$ ’s directly, but only on their inner products $\boldsymbol{\phi}_{n,s}^{\top}\boldsymbol{\phi}_{n,s^{\prime}}$ for any $s$ and $s^{\prime}$ . These products can be easily calculated through the kernel function as $\boldsymbol{\phi}_{n,s}^{\top}\boldsymbol{\phi}_{n,s^{\prime}}=K_{n}(\mathbf{z}_{n,s},\mathbf{z}_{n,s^{\prime}})$ ; see [24].

Since the constraints in (16) are enforced for the scenario data, the learned rules do not necessarily satisfy these constraints for all $\mathbf{z}_{n,s}$ with $s\notin\{1,\ldots,S\}$ . This limitation appears also in scenario-based and chance-constrained designs [17]. Once a control rule is learned, in real-time $t$ , it can be heuristically projected within $[-\bar{q}_{n,t}^{g},+\bar{q}_{n,t}^{g}]$ as

[TABLE]

IV-B Implementing reactive control rules

Our control scheme involves four steps; see also Fig. 1:

T1)

The utility collects scenario data $\mathbf{z}_{n,s}$ for all $n$ and $s$ . 2. T2)

The utility designs rules by solving (16); see Section V. 3. T3)

Each inverter $n$ receives $S+1$ data $(\mathbf{a}_{n},b_{n})$ from the utility, which describe $f_{n}$ . 4. T4)

Over the next $30$ minutes and at real time $t$ , each inverter $n$ will be collecting $\mathbf{z}_{n,t^{\prime}}$ and applying the rule

[TABLE]

The aforesaid process is explicated next. Regarding T1), scenario data should be as representative as possible for the grid conditions anticipated over the following $30$ -min control period. One option would be to use load and solar generation forecasts. A second option would be to use historical data from the previous day and same time, if they representative of today’s conditions. A third alternative would be to use the most recent grid conditions known to the utility. For example, if smart meter data are collected every $30$ min anyway, they can be used in lieu of forecasts for the next control period.

The numerical tests of Section VI adopt the third option and use the minute-based grid conditions observed over the last $30$ -minutes as $S=30$ scenarios to train the inverter rules for the upcoming $30$ -minute interval. Obviously, the number of training scenarios $S$ does not have to coincide with the length of the control period measured in minutes. These two parameters relate to loading conditions; feeder details; availability and quality of scenario data; communication and computational resources. Selecting their optimal values goes beyond the scope of this work.

During T4), inverter $n$ has already received $(\mathbf{a}_{n},b_{n})$ and $\{\mathbf{z}_{n,s}\}_{s=1}^{S}$ during T3). Each $\mathbf{z}_{n}$ may consist of local data and a few active flow readings collected from major lines or transformers. If the entries of $\mathbf{z}_{n}$ are all local, the rule can be applied with no communication. Otherwise, the non-local entries of $\mathbf{z}_{n}$ have to be sent to inverter $n$ . If non-local inputs are shared among inverters, broadcasting protocols can reduce the communication overhead.

Remark 2.

Suppose each inverter $n$ knows the training data $\mathbf{z}_{n,s}$ for $s\in\mathcal{S}$ . Function $f_{n}$ can be described in two ways: Either through (17) using the data described under T3); or through (20)–(21) via $\mathbf{w}_{n}$ . For the second way, vector $\mathbf{w}_{n}$ has $M_{n}$ entries in the linear case and $\Phi_{n}$ entries in the nonlinear case. For the linear case, if $M_{n}<S+1$ , representing $f_{n}$ through (20) by $\mathbf{w}_{n}$ is more parsimonious. Representation (17) becomes advantageous only when $\Phi_{n}\gg S+1$ under the nonlinear case.

V Support Vector Reactive Power Control

This section converts (16) to a vector minimization and explores different options for $\Delta$ . From (18), the output of inverter $n$ across all $S$ scenarios is $\mathbf{K}_{n}\mathbf{a}_{n}+b_{n}\mathbf{1}$ . Then, the apparent power constraints in (16) can be written as

[TABLE]

where $\bar{\mathbf{q}}_{n}^{g}:=[\bar{q}_{n,1}^{g}~{}\cdots~{}\bar{q}_{n,S}^{g}]^{\top}$ . Moreover, the vector of voltage deviations can be expressed as

[TABLE]

where $\mathbf{x}_{n}$ is the $n$ -th column of $\mathbf{X}$ . Substituting (19) and (23)–(V), the optimization in (16) can be posed as a second-order cone program (SOCP) over $\{\mathbf{a}_{n}\}_{n\in\mathcal{N}}$ and $\mathbf{b}$ .

Nonetheless, solving (16) with $\Delta=\Delta_{\epsilon}$ yields optimal $\mathbf{a}_{n}$ ’s with several non-zero entries. This means that to describe rule $n$ by (22), the utility needs to communicate the entire vector $\mathbf{a}_{n}$ during T3). If scenarios $\{\mathbf{z}_{n,t}\}_{t=1}^{T}$ are not known by the inverter, they have to be communicated along with $\mathbf{a}_{n}$ as well. The number of scenarios $T$ may be large when learning rules under complex feeder setups. A related approach for minimizing a convex combination of $\Delta_{s}$ and power losses has been suggested in the conference precursor of this work [31], but inherits the same difficulty of non-sparse $a_{n,t}$ ’s.

Inspired by support vector machines (SVM), we engineer $\Delta$ to obtain inverter rules described by possibly fewer scenarios: Promoting sparse $\mathbf{a}_{n}$ ’s alleviates the communication overhead during step T3). To this end, we put forth the cost

[TABLE]

for some $\tau>0$ . If scenario $s$ yields a vector of voltage deviations $\mathbf{X}\mathbf{q}^{g}_{s}+\mathbf{y}_{s}$ with $\ell_{2}$ -norm smaller than $\tau$ , this scenario incurs no cost. If $\|\mathbf{X}\mathbf{q}^{g}_{s}+\mathbf{y}_{s}\|_{2}>\tau$ , the voltage regulation penalty grows with $\|\mathbf{X}\mathbf{q}_{s}^{g}+\mathbf{y}_{s}\|_{2}$ . The cost in (25) can be expressed as an SOCP over the slack variable $d$

[TABLE]

Applying the same epigraph trick for the function norms, problem (16) can be solved as the SOCP

[TABLE]

where $\mathbf{d}:=[d_{1}~{}\cdots~{}d_{S}]^{\top}$ and $\boldsymbol{\gamma}:=[\gamma_{1}~{}\cdots~{}\gamma_{N}]^{\top}$ . The variables $\mathbf{q}_{s}^{g}$ can be eliminated using the substitutions of (V). Solving (26) takes $\mathcal{O}\left(N^{3.5}T^{3}\right)$ operations with interior point-based solvers [32]. However, the advantage of inverter control rules is that (26) is not solved in real time. If standard interior point-based solvers are not scalable to larger grids, one may resort to (distributed) first-order algorithms; warm-start initializations; and cutting-plane methods.

The coefficients $\mathbf{a}_{n}$ ’s minimizing (26) enjoy two types of sparsity, across inverters and across scenarios. To explain the first type of sparsity, express the second summand in the cost of (26) as $\mu\boldsymbol{\gamma}^{\top}\mathbf{1}=\mu\sum_{n=1}^{N}\|\mathbf{K}_{n}^{1/2}\mathbf{a}_{n}\|_{2}$ . Having these non-squared $\ell_{2}$ -norms in the objective promotes block sparsity across $n$ , in the sense that for larger $\mu$ , some vectors $\mathbf{a}_{n}$ may be set to zero. This effect is a direct consequence of block-sparse solutions encountered in group Lasso (G-Lasso)-formulations; see [33], [26], [30]. All inverters receive a reactive power setpoint $b_{n}$ , but if the optimal $\mathbf{a}_{n}$ becomes zero, inverter $n$ will not be changing its reactive injection in real-time. One may drop the intercept $b_{n}$ from the control rule of (13) and the optimization of (26), and modify the feature vector as

[TABLE]

Thus, obtaining $\mathbf{a}_{n}=\mathbf{0}$ from (26) enables inverter selection.

The next proposition studies the second type of sparsity; see the appendix for a proof.

Proposition 1.

Consider (16) with $\Delta=\Delta_{\tau}$ and its minimizer in (17). If $\|\mathbf{X}\mathbf{q}_{s}^{g}+\mathbf{y}_{s}\|_{2}<\tau$ for scenario $s$ at the optimum, then $a_{n,s}=0$ for every inverter $n$ with $|q_{n,s}^{g}|<\overline{q}_{n,s}^{g}$ .

Proposition 1 explains how $\Delta_{\tau}$ promotes block sparsity across $s$ : If scenario $s$ does not experience severe voltage violations, the corresponding coefficients $a_{n,s}$ will be zero for all inverters $n$ that have not reached their apparent power limit. Block sparsity across time identifies non-critical scenarios. Phrased in the SVM context, the so-termed ‘support vectors’ here correspond to scenarios with significant voltage deviations. Larger values of $\tau$ effect fewer critical scenarios.

These two forms of sparsity offer communication savings since the related $(a_{n,s},\mathbf{z}_{n,s})$ do not need to be communicated to inverters. This enables training the rules for larger number of scenarios $S$ at the same communication overhead. Note that for fixed $(\mu,\tau)$ , the sparsity of $\mathbf{a}_{n}$ ’s depends on the training data $\mathbf{y}_{s}$ ’s as well. If a particular sparsity goal is to be met, the utility has to solve (26) repeatedly for various values of $\mu$ and $\tau$ . Such computations can be significantly sped up by initializing an optimization algorithm for one value of $\tau$ to the minimizer obtained using the previous value of $\tau$ [24, Sec. 18.4]; however, such techniques will not be pursued here.

Different from $\Delta_{\tau}$ , cost $\Delta_{\epsilon}$ is not expected to yield as sparse $\mathbf{a}_{n}$ ’s. The next claim (proved in the appendix) explains that even if a single bus experiences voltage deviation larger than $\epsilon$ for scenario $s$ , then $a_{n,s}\neq 0$ for all $n$ . In other words, a voltage violation at a single bus for scenario $s$ renders this scenario critical for all inverter rules.

Proposition 2.

Consider (16) with $\Delta=\Delta_{\epsilon}$ and its minimizer in (17). If $\|\mathbf{X}\mathbf{q}_{s}^{g}+\mathbf{y}_{s}\|_{\infty}>\epsilon$ for scenario $s$ at the optimum, then $a_{n,s}\neq 0$ for all $n$ .

VI Numerical Tests

The novel inverter rules were tested on the IEEE 123-bus feeder [34], converted to a single-phase grid as described in [35]. Residential load and solar data were extracted from the Pecan Street dataset as delineated next [2]. Minute-sampled active load and solar generation data were collected for June 1, 2013 between 8:00–16:00. We downloaded data from the first 123 Pecan Street nodes, after excluding nodes with empty data records. Regarding solar generation, unless stated otherwise, $75\%$ of the buses had solar generation by excluding nodes with bus indexes that are multiples of $4$ .

Load data were scaled on a per bus basis so that their daily peak values matched $150\%$ of the benchmark load. Since the Pecan Street data included only active power, we drew lagging power factors uniformly at random within $[0.9,0.95]$ for each bus and kept them fixed across time. The scaling factors for active loads were also used for scaling solar data. To allow for reactive power compensation even at peak solar irradiance, inverters were over-sized by $10\%$ providing an apparent power capacity of $\bar{s}_{n}^{g}=1.1\bar{p}_{n}^{g}$ for all $n$ ; see [1].

Our numerical tests included six control schemes:

C1) The optimal reactive injections computed by (5) on a per-minute basis;

C2) The optimal reactive injections computed by (5) on a per-minute basis assuming a $2$ -minute communication delay;

C3) The fixed Watt-VAR control rules of [1, (12)–(14)];

C4) The rules of (16) for linear kernels and $\Delta=\Delta_{\tau}$ ;

C5) The rules of (16) for Gaussian kernels and $\Delta=\Delta_{\tau}$ ;

C6) The rules of (16) for linear kernels and $\Delta=\Delta_{\epsilon}$ ; and

C7) The rules of (16) for Gaussian kernels and $\Delta=\Delta_{\epsilon}$ .

The input $\mathbf{z}_{n}$ to inverter $n$ consisted of local data as in (14). Each entry of $\mathbf{z}_{n}$ was centered by its daily mean and normalized by its daily standard deviation. To avoid rank deficiency, we added $10^{-3}\cdot\mathbf{I}_{S}$ to all kernel matrices.

Schemes C1),C2) were solved using SDPT3 and YALMIP with MATLAB [36, 37]. Schemes C4)–C7) were solved by invoking the MOSEK solver directly through MATLAB [38]. Tests were run on a 2.4 GHz Intel Core i5 laptop with 8 GB RAM. The average running time for solving (16) with $T=30$ is given in Table I. It should be emphasized that although the control rules were designed using the LDF grid model, the voltage deviations experienced by all control rules were tested using the full AC model.

During training, we used $T=30$ scenarios to learn the SVM-based control rules of C4)–C7). These scenarios comprised the load and solar data observed during the last $30$ minutes. During validation, the inverter control rules were tested over the following $30$ minutes. Parameters $\mu$ and $\gamma$ were selected via $5$ -fold cross-validation. The ranges of $\tau$ and $\epsilon$ were empirically chosen to yield an average communication overhead similar to the one needed by the affine rule of (20) as discussed under Remark 2: An affine rule is described by $M_{n}+1=4$ data per inverter. If only $10\%$ of the entries of $\mathbf{a}_{n}$ are nonzero, then communicating $(\mathbf{a}_{n},b_{n})$ entails sending $0.1\cdot S+1=0.1\cdot 30+1=4$ data as well. The sparsity of $\mathbf{a}_{n}$ ’s depends on input data along with the values of $(\tau,\mu)$ or $(\epsilon,\mu)$ . These parameters were set so that $\mathbf{a}_{n}$ ’s had $10\%$ nonzero entries on the average across time and buses.

We next explored the trade-off between voltage deviation and the sparsity of $\mathbf{a}_{n}$ ’s for C4)–C7). The expectations from this test were two: i) voltage deviations are expected to increase for sparser $\mathbf{a}_{n}$ ’s; ii) schemes C4) and C5) should exhibit improved sparsity over C6) and C7). To validate these hypotheses, we recorded the voltage deviations for $10$ values of $\tau$ and $\epsilon$ for C4)–C7). The average absolute voltage deviation and the average percentage of non-zero coefficients were calculated over the day and across buses, and are shown in Figure 2. From Figure 2, the value of $\tau$ yielding a sparsity of roughly $11\%$ is $\tau=0.001$ . Figure 2 reveals three important points. First, voltage deviations increase as $\mathbf{a}_{n}$ ’s become sparser as expected. Second, for a given sparsity in $\mathbf{a}_{n}$ ’s, the rules obtained by $\Delta_{\tau}$ exhibit smaller voltage deviations compared to the rules obtained by $\Delta_{\epsilon}$ . Because of this, we focus on the performance of C4)–C5) for the rest of this section. Third, the Gaussian kernel-based rules attained lower voltage deviations than the related linear kernel-based rules.

We next tested the effect of $\mu$ on inverter selection and voltages. Larger values of $\mu$ are expected to set more $\mathbf{a}_{n}$ ’s to zero. To eliminate the inverters with $\mathbf{a}_{n}=\mathbf{0}$ , the parameter $b_{n}$ was appended in $\mathbf{a}_{n}$ as delineated in (27). For a fixed value of $\tau=0.001$ , for scheme C4), the values of $\mu$ were obtained using cross-validation across the day. The control rules were designed again using $4$ different values of $\mu$ . As expected, by increasing the value of $\mu$ , the number of all-zero $\mathbf{a}_{n}$ ’s and the corresponding voltage deviations were increased. Figure 3 depicts the absolute voltage deviation averaged over time for each inverter. Notice that the values of $\tau$ and $\mu$ were kept fixed, although the training data $\mathbf{y}_{s}$ ’s varied across the day. Due to this, the reported sparsity in Figure 2 is the average sparsity across time and inverters. Moreover, the number of inverters in Figure 3 is the average number of activated inverters across the day. Even though the values of $\mu$ and $\tau$ can be adjusted on a $30$ -min basis to meet specific sparsity requirements, we chose to keep them fixed to simplify the exposition. In fact, the rest of this section reports the worst-case instead of average voltage deviations across time and for each bus.

We next compared the proposed SVM-based control rules against the alternative schemes of C1)–C3). To this end, voltage deviations were calculated between 8:00–16:00 for schemes C1)–C5). Figures 4 and 5 demonstrate the average and the maximum voltage deviations over the test period. It can be observed from both figures that the Gaussian SVM-based rule performs better than C1)–C3) due to its ability to capture non-linear behaviors. Although C3) needs no communication, it violates the ANSI-C.84.1 standard voltage constraints. Furthermore, despite the high communication needed, scheme C2) shows no superiority in performance over C5) and corroborates the need for real-time response to system inputs.

In all previous tests, the rules were fed with locally recorded data. To evaluate the advantage of adding remote control inputs, we appended the values of active power flows on the lines feeding buses $1$ , $16$ , and $51$ , to all input vectors $\mathbf{z}_{n}$ . The daily maximum and the average voltage deviations attained by C1)–C5) are depicted in Figures 6 and 7, respectively. As expected, the results suggest that adding remote inputs to the rules improves the grid voltage profile at the expense of increased inter-network communication.

As mentioned in Section IV-B, the length of the control period (in minutes) over which rules remain constant does not have to agree with the number of scenarios $S$ used for training the rules. To evaluate how the control rules perform for longer control periods, Figure 8 compares the voltage deviations obtained by training rules using $S=30$ scenarios, but keeping them unaltered over $30$ , $45$ , and $60$ minutes. As expected, voltage regulation deteriorates as rules remain unchanged for longer periods.

All previous tests assumed solar penetration of $75\%$ . We also tested the performance of C1)–C5) under penetrations of $50\%$ and $25\%$ . To simulate $50\%$ penetration, solar generation and smart inverters were installed only in buses with even indexes. Likewise, to simulate $25\%$ penetration, we considered buses whose indexes were multiples of $4$ . Figures 9 and 10 depict the attained maximum absolute voltage deviations, which apparently decrease with decreasing solar penetration. For lower penetrations, the Gaussian-based rule preserves its superior voltage profile over the other schemes.

Schemes C4) and C5) were also tested under less communication by scaling down the sparsity in $\mathbf{a}_{n}$ ’s by a factor of $10$ : Voltage deviations were evaluated for $\tau=0.03$ corresponding to $1.4\%$ non-zero entries for $\mathbf{a}_{n}$ ’s on the average. Figure 11 demonstrates the maximum absolute voltage deviation for C1)–C5). Even with fewer coefficients communicated, the voltage constraints of ANSI-C.84.1 were still satisfied.

The last set of numerical tests compares the developed single-step approach with the two-step approach of [20]–[21]; see also Remark 1. We used both approaches to design local linear control rules for the IEEE 13-bus feeder [23], under the voltage deviation cost $\Delta=\Delta_{\epsilon}$ with $\epsilon=0.001$ . The top and center panels of Fig. 12 show respectively the maximum and average voltage deviation per bus computed across time. The bottom panel shows the voltage deviation cost $\Delta_{\epsilon}$ , time-averaged per control period. The bottom panel also shows the voltage deviation cost $\Delta_{\tau}$ with $\tau=0.01$ attained upon training both rules using $\Delta_{\tau}$ instead of $\Delta_{\epsilon}$ . Similar results were obtained for other values of $\epsilon$ and $\tau$ . According to these tests, the single-step approach achieved: 1) lower maximum per-bus voltage deviations; 2) lower average per-bus voltage deviations; and 3) smaller voltage deviation costs during the operational phase.

VII Conclusions

A novel approach for designing inverter control rules has been put forth. It relies on both data-based learning and physical grid modeling. Inverter rules are not learned independently using input/output pairs of the OPF problem. Instead, they are learned jointly by posing the related OPF problem as a multi-function learning task. Because of the way voltage deviations couple inverter outputs, the conventional support vector machine approach fails to yield sparse rule descriptions. We have engineered a voltage deviation cost to identify ‘support scenarios,’ that is a few scenarios with non-zero coefficients for most of inverter rules. The devised control rules were tested using on a benchmark feeder using the exact AC model. The novel scheme attained superior voltage regulation performance compared to preset local rules, and oftentimes comparable performance to an optimal inverter dispatch delayed by $2$ minutes. The numerical tests have further corroborated the benefits of nonlinear rules with non-local inputs, and explored the trade-off between voltage regulation performance and sparsity. Finally, this work motivates several questions. On the implementation side, testing the novel formulations on multiphase grids along with capacitor banks, voltage regulators, and ZIP loads, is of practical interest. On the analytical side, chance-constrained formulations; studying the stability of nonlinear rules with voltages as inputs; using kernels to learn functions with constraints; and selecting non-local control inputs; are some open and interesting questions.

Proof:

Consider first the linear rules of (20), for which $q_{n}^{g}(\mathbf{z}_{n})=\mathbf{z}_{n}^{\top}\mathbf{w}_{n}+b_{n}$ for all $n$ . Problem (16) with $\Delta=\Delta_{\tau}$ can be reformulated as

[TABLE]

Express voltage deviations at $s$ in terms of $\mathbf{w}_{n}$ ’s and $\mathbf{b}$

[TABLE]

Let us next introduce the Lagrange multipliers [32]:

•

$\underline{\boldsymbol{\lambda}}_{n}\geq\mathbf{0}$ and $\overline{\boldsymbol{\lambda}}_{n}\geq\mathbf{0}$ corresponding to the linear inequalities in (28c) for all $n$ ;

•

$(\mathbf{u}_{n},\rho_{n})$ related to constraint (28d) for all $n$ ; and

•

$(\boldsymbol{\mu}_{s},\sigma_{s})$ related to constraint (28e) for all $s$ .

Collect multipliers in $\mathbf{M}:=[\boldsymbol{\mu}_{1}~{}\cdots~{}\boldsymbol{\mu}_{S}]\in\mathbb{R}^{N\times S}$ , and vectors $\boldsymbol{\rho}:=[\rho_{1}~{}\cdots~{}\rho_{N}]^{\top}$ and $\boldsymbol{\sigma}:=[\sigma_{1}~{}\cdots~{}\sigma_{S}]^{\top}$ . After some algebra, the Lagrangian of (28) can be written as

[TABLE]

Minimizing $L$ over the primal variables provides

[TABLE]

From (30), the dual of (28) becomes the SOCP problem

[TABLE]

It is not hard to check that (28) and (31) are strictly feasible, so strong duality holds and both problems are solvable. The optimal primal and dual variables satisfy complementary slackness SOCPs; see [32, Sec. 4.1]. For constraints (28d) and (31d), these conditions identify three cases:

c1)

If $\|\mathbf{w}_{n}\|_{2}<\gamma_{n}$ , then $\|\mathbf{u}_{n}\|_{2}=\rho_{n}=0$ ; 2. c2)

If $\|\mathbf{u}_{n}\|_{2}<\rho_{n}$ , then $\|\mathbf{w}_{n}\|_{2}=\gamma_{n}=0$ ; or 3. c3)

If $\|\mathbf{w}_{n}\|_{2}=\gamma_{n}$ and $\|\mathbf{u}_{n}\|_{2}=\rho_{n}$ , then $\gamma_{n}\mathbf{u}_{n}=-\rho_{n}\mathbf{w}_{n}$ .

Recall that $\rho_{n}=\mu>0$ from (30b). Moreover, it is not hard to see that $\|\mathbf{w}_{n}\|_{2}=\gamma_{n}$ at the optimum of (28). Then, case c1) cannot occur. The other two cases entail that $\mathbf{w}_{n}=\alpha_{n}\mathbf{u}_{n}$ for some $\alpha_{n}\leq 0$ . Substituting $\mathbf{u}_{n}$ from (30c), and evaluating rule $n$ at the tested scenarios gives

[TABLE]

Here we identify $\mathbf{K}_{n}=\mathbf{Z}_{n}^{\top}\mathbf{Z}_{n}$ and the coefficients in (22) as

[TABLE]

Focus now on the complementary slackness for (28e) and (31e). The equivalent to condition c1) reads now as:

c1’)

If $\|\mathbf{X}\mathbf{q}_{s}^{g}+\mathbf{y}_{s}\|_{2}<d_{s}+\tau$ , then $\|\boldsymbol{\mu}_{s}\|_{2}=\sigma_{s}=0$ .

Suppose the optimal primal variables satisfy $\|\mathbf{X}\mathbf{q}_{s}^{g}+\mathbf{y}_{s}\|_{2}<\tau$ . Then $d_{s}=0$ follows from (28), and c1’) gives $\|\boldsymbol{\mu}_{s}\|_{2}=\sigma_{s}=0$ . The $s$ -th entry of $\mathbf{a}_{n}$ in (32) is

[TABLE]

Complementary slackness for (28c) implies that $\overline{\lambda}_{n,s}=\underline{\lambda}_{n,s}=0$ if $|q_{n,s}^{g}|<\bar{q}_{n,s}^{g}$ at the optimal, thus proving the claim for linear rules. The result in (33) holds for nonlinear rules too. The analysis carries over upon matching the length of $\mathbf{w}_{n}$ with the length of $\boldsymbol{\phi}(\mathbf{z}_{n})$ , and substituting $\mathbf{Z}_{n}^{\top}\mathbf{Z}_{n}$ by $\mathbf{K}_{n}$ . ∎

Proof:

Rewrite (16) for $\Delta=\Delta_{\epsilon}$ as

[TABLE]

The Lagrangian multipliers of (34) are similar to shose of (28), except for $(\boldsymbol{\mu}_{s},\sigma_{s})$ being replaced by ( $\underline{\boldsymbol{\mu}}_{s},\overline{\boldsymbol{\mu}}_{s}$ ) and collected in $\underline{\mathbf{M}}:=[\underline{\boldsymbol{\mu}}_{1}~{}\cdots~{}\underline{\boldsymbol{\mu}}_{S}]$ and $\overline{\mathbf{M}}:=[\overline{\boldsymbol{\mu}}_{1}~{}\cdots~{}\overline{\boldsymbol{\mu}}_{S}]$ . Minimizing the Lagrangian of (34) over the primal variables yields

[TABLE]

Similar to Prop. 1, the $s$ -th entry of $\mathbf{a}_{n}$ becomes

[TABLE]

If the optimal primal variables satisfy $\|\mathbf{X}\mathbf{q}_{s}+\mathbf{y}_{s}\|_{\infty}>\epsilon$ , then $\mathbf{d}_{s}\neq\mathbf{0}$ and accordingly, complementary slackness for (34e) implies that $\overline{\boldsymbol{\mu}}_{s}\neq\mathbf{0}$ or $\underline{\boldsymbol{\mu}}_{s}\neq\mathbf{0}$ . ∎

Bibliography38

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] K. Turitsyn, P. Sulc, S. Backhaus, and M. Chertkov, “Options for control of reactive power by distributed photovoltaic generators,” Proc. IEEE , vol. 99, no. 6, pp. 1063–1073, Jun. 2011.
2[2] (2018) Pecan Street Inc. Dataport. [Online]. Available: https://dataport.cloud/
3[3] Y. Agalgaonkar, B. Pal, and R. Jabr, “Stochastic distribution system operation considering voltage regulation risks in the presence of PV generation,” IEEE Trans. Sustain. Energy , vol. 6, no. 4, pp. 1315–1324, Oct. 2015.
4[4] IEEE 1547 Standard for Interconnecting Distributed Resources with Electric Power Systems , IEEE Std., 2018. [Online]. Available: http://grouper.ieee.org/groups/scc 21/1547/1547_index.html
5[5] S. Low, “Convex relaxation of optimal power flow — Part II: Exactness,” IEEE Trans. Control Netw. Syst. , vol. 1, no. 2, pp. 177–189, Jun. 2014.
6[6] G. Wang, V. Kekatos, A.-J. Conejo, and G. B. Giannakis, “Ergodic energy management leveraging resource variability in distribution grids,” IEEE Trans. Power Syst. , vol. 31, no. 6, pp. 4765–4775, Nov. 2016.
7[7] Y. Zhang, N. Gatsis, and G. B. Giannakis, “Robust energy management for microgrids with high-penetration renewables,” IEEE Trans. Sustain. Energy , vol. 4, no. 4, pp. 944–953, Oct. 2013.
8[8] M. Farivar, R. Neal, C. Clarke, and S. Low, “Optimal inverter VAR control in distribution systems with high PV penetration,” in Proc. IEEE Power & Energy Society General Meeting , San Diego, CA, Jul. 2012.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Designing Reactive Power Control Rules

Abstract

Index Terms:

I Introduction

II Reactive Power Control

III Preliminaries on Kernel-based Learning

IV Kernel-based Control Policies

IV-A Learning rules from scenario data

Remark 1**.**

IV-B Implementing reactive control rules

Remark 2**.**

V Support Vector Reactive Power Control

Proposition 1**.**

Proposition 2**.**

VI Numerical Tests

VII Conclusions

Proof:

Proof:

Remark 1.

Remark 2.

Proposition 1.

Proposition 2.