Fault Location in Power Distribution Systems via Deep Graph   Convolutional Networks

Kunjin Chen; Jun Hu; Yu Zhang; Zhanqing Yu; Jinliang He

arXiv:1812.09464·cs.LG·November 19, 2019

Fault Location in Power Distribution Systems via Deep Graph Convolutional Networks

Kunjin Chen, Jun Hu, Yu Zhang, Zhanqing Yu, Jinliang He

PDF

1 Repo

TL;DR

This paper introduces a graph convolutional network framework for accurately locating faults in power distribution systems, effectively handling noise, data loss, and topology changes, outperforming existing machine learning methods.

Contribution

The paper presents a novel GCN-based approach that integrates system topology and multiple measurements for fault location, demonstrating superior accuracy and robustness over existing methods.

Findings

01

GCN significantly outperforms other machine learning schemes in fault location accuracy.

02

The approach is robust to measurement noise and data loss.

03

The model adapts well to topology changes and limited measurement data.

Abstract

This paper develops a novel graph convolutional network (GCN) framework for fault location in power distribution networks. The proposed approach integrates multiple measurements at different buses while taking system topology into account. The effectiveness of the GCN model is corroborated by the IEEE 123 bus benchmark system. Simulation results show that the GCN model significantly outperforms other widely-used machine learning schemes with very high fault location accuracy. In addition, the proposed approach is robust to measurement noise and data loss errors. Data visualization results of two competing neural networks are presented to explore the mechanism of GCN's superior performance. A data augmentation procedure is proposed to increase the robustness of the model under various levels of noise and data loss errors. Further experiments show that the model can adapt to topology…

Tables8

Table 1. TABLE I: Fault Location Accuracies of Different Approaches

Model	Accuracy	One-hop Accuracy
PCA + SVM	$94.60$	$98.31$
PCA + RF	$94.96$	$99.28$
FCNN	$84.64$	$96.38$
GCN	$99.26$	$99.93$

Table 2. TABLE II: Fault Location Accuracies of the Models Under Various Measurement Modifications

Model	Noise (I)	Bus (II)	Random (III)	I + II	I + III	II + III	I + II + III
PCA + SVM	$89.13 / 97.30$	$58.73 / 79.97$	$61.43 / 81.78$	$57.76 / 79.26$	$60.33 / 81.09$	$45.44 / 69.64$	$44.87 / 69.20$
PCA + RF	$85.94 / 96.77$	$53.82 / 67.62$	$58.57 / 74.07$	$52.15 / 66.66$	$56.94 / 73.23$	$40.55 / 55.84$	$40.05 / 55.64$
FCNN	$85.72 / 95.95$	$62.61 / 82.93$	$69.40 / 88.09$	$61.24 / 82.47$	$69.51 / 87.96$	$53.54 / 76.42$	$54.12 / 76.83$
GCN	$97.10 / 99.72$	$92.67 / 97.44$	$89.09 / 96.67$	$90.76 / 97.83$	$87.70 / 96.31$	$83.55 / 94.51$	$80.63 / 93.86$

Table 3. TABLE III: Fault Location Accuracies of the Models Under Various Measurement Modifications When Trained With Noisy Data

Model	Noise	Noise + Bus	Noise + Random	All Combined
PCA + SVM	$85.70 / 96.21$	$55.98 / 77.74$	$58.00 / 80.17$	$44.12 / 68.51$
PCA + RF	$86.51 / 97.55$	$64.11 / 81.61$	$66.12 / 84.90$	$52.34 / 72.13$
FCNN	$86.95 / 97.19$	$61.95 / 82.58$	$70.32 / 88.52$	$53.98 / 76.55$
GCN	$97.52 / 99.73$	$92.67 / 98.26$	$88.76 / 96.44$	$84.53 / 94.77$

Table 4. TABLE IV: Fault Location Accuracies of the Models with Additional Data Generated With Changed Phases of Chosen Branches

Model	All Buses	Modified Buses
PCA + SVM	$81.87 / 94.09$	$77.47 / 92.00$
PCA + RF	$81.57 / 95.11$	$79.64 / 93.93$
FCNN	$85.38 / 96.20$	$82.85 / 95.36$
GCN	$97.65 / 99.77$	$93.66 / 99.38$

Table 5. TABLE V: Fault Location Accuracies of the GCN Model With Different Measurement Scenarios

Measurement Scenario	Noise	Noise + Bus	Noise + Random	All Combined
Voltage amplitudes	$94.66 / 99.36$	$67.51 / 82.91$	$75.55 / 87.38$	$58.02 / 75.34$
Voltage phasors	$97.43 / 99.88$	$90.38 / 97.49$	$89.82 / 97.33$	$83.54 / 94.63$
Current phasors	$91.04 / 98.43$	$82.40 / 93.64$	$83.04 / 94.44$	$76.32 / 90.44$
Voltage and current phasors	$97.52 / 99.73$	$92.67 / 98.26$	$88.76 / 96.44$	$84.53 / 94.77$

Table 6. TABLE VI: Fault Location Accuracies of the GCN Model for the IEEE 37 Bus System With Different Values of K n subscript 𝐾 𝑛 K_{n} and K 𝐾 K

Hyper-parameters	Zero-hop	One-hop	Two-hop
$K_{n} = 10, K = [1, 2, 3]$	$88.70$	$\underline{95.31}$	$96.94$
$K_{n} = 10, K = [2, 3, 4]$	$88.94$	$96.36$	$97.03$
$K_{n} = 10, K = [3, 4, 5]$	$89.97$	$97.14$	$97.89$
$K_{n} = 5, K = [2, 3, 4]$	$88.90$	$\underline{95.04}$	$96.81$
$K_{n} = 15, K = [2, 3, 4]$	$90.05$	$96.93$	$97.73$
$K_{n} = 20, K = [2, 3, 4]$	$89.51$	$96.20$	$97.25$
$K_{n} = 5, K = [1, 1, 1]$	$\underline{88.12}$	$95.59$	$\underline{96.48}$
$K_{n} = 10, K = [1, 1, 1]$	$\underline{88.18}$	$\underline{95.31}$	$\underline{96.25}$

Table 7. TABLE VII: Fault Location Accuracies of the GCN Model for the IEEE 37 Bus System With Different Numbers of Layers

Hyper-parameters	Zero-hop	One-hop	Two-hop
$K = [2]$ (1 layer)	$74.80$	$88.55$	$93.06$
$K = [2, 3]$ (2 layers)	$85.26$	$94.37$	$95.59$
$K = [2, 3, 4]$ (3 layers)	$88.94$	$96.36$	$97.03$

Table 8. TABLE VIII: Fault Location Accuracies of the GCN Model for the IEEE 37 Bus System With Different Sizes of Training Datasets

Dataset Size	Zero-hop	One-hop	Two-hop
100%	$88.94$	$96.36$	$97.03$
50%	$66.99$	$87.04$	$93.73$
25%	$45.46$	$73.33$	$88.69$
10%	$21.42$	$46.79$	$67.49$
5%	$13.21$	$33.27$	$52.73$

Equations12

Δ = D^{- 1/2} Δ_{u} D^{- 1/2} = I - D^{- 1/2} W D^{- 1/2},

Δ = D^{- 1/2} Δ_{u} D^{- 1/2} = I - D^{- 1/2} W D^{- 1/2},

g * f = Φ ((Φ^{⊤} g) \circ (Φ^{⊤} f)) = Φ diag (\overset{g}{^}_{1}, \dots, \overset{g}{^}_{n}) Φ^{⊤} f,

g * f = Φ ((Φ^{⊤} g) \circ (Φ^{⊤} f)) = Φ diag (\overset{g}{^}_{1}, \dots, \overset{g}{^}_{n}) Φ^{⊤} f,

h_{α} (Λ) = k = 0 \sum K α_{k} Λ^{k},

h_{α} (Λ) = k = 0 \sum K α_{k} Λ^{k},

h_{α} (Λ) = k = 0 \sum K α_{k} T_{k} (\tilde{Λ}),

h_{α} (Λ) = k = 0 \sum K α_{k} T_{k} (\tilde{Λ}),

Φ h_{α} (Λ) Φ^{⊤} f = h_{α} (Δ) f = k = 0 \sum K α_{k} T_{k} (\tilde{Δ}) f,

Φ h_{α} (Λ) Φ^{⊤} f = h_{α} (Δ) f = k = 0 \sum K α_{k} T_{k} (\tilde{Δ}) f,

y_{j} = i = 1 \sum N_{in} h_{α_{i, j}} (Δ) x_{i},

y_{j} = i = 1 \sum N_{in} h_{α_{i, j}} (Δ) x_{i},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BNN-UPC/GNNPapersPowerNets
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsGraph Convolutional Network

Full text

Fault Location in Power Distribution Systems

via Deep Graph Convolutional Networks

Kunjin Chen, Jun Hu, Member, IEEE, Yu Zhang, Member, IEEE, Zhanqing Yu, Member, IEEE, and Jinliang He, Fellow, IEEE

Manuscript received May 27, 2019; revised September 16, 2019; accepted October 5, 2019. This work was supported in part by National Key R&D Program of China under Grant 2018YFB0904603, Natural Science Foundation of China under Grant 51720105004, State Grid Corporation of China under Grant 5202011600UJ, the Hellman Fellowship, and the Faculty Research Grant (FRG) of University of California, Santa Cruz. K.-J. Chen, J. Hu, Z.-Q. Yu, and J.-L. He are with the State Key Lab of Power Systems, Dept. of Electrical Engineering, Tsinghua University, Beijing 100084, P. R. of China. Y. Zhang is with the Dept. of Electrical and Computer Engineering, University of California, Santa Cruz, CA 95064, USA. (Corresponding author email: [email protected]).

Abstract

This paper develops a novel graph convolutional network (GCN) framework for fault location in power distribution networks. The proposed approach integrates multiple measurements at different buses while taking system topology into account. The effectiveness of the GCN model is corroborated by the IEEE 123 bus benchmark system. Simulation results show that the GCN model significantly outperforms other widely-used machine learning schemes with very high fault location accuracy. In addition, the proposed approach is robust to measurement noise and data loss errors. Data visualization results of two competing neural networks are presented to explore the mechanism of GCN’s superior performance. A data augmentation procedure is proposed to increase the robustness of the model under various levels of noise and data loss errors. Further experiments show that the model can adapt to topology changes of distribution networks and perform well with a limited number of measured buses.

Index Terms:

Fault location, distribution systems, deep learning, graph convolutional networks.

©2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. DOI: 10.1109/JSAC.2019.2951964

I Introduction

Distribution systems are constantly under the threat of short-circuit faults that would cause power outages. In order to enhance the operation quality and reliability of distribution systems, system operators have to deal with outages in a timely manner. Thus, it is of paramount importance to accurately locate and quickly clear faults immediately after the occurrence, so that quick restoration can be achieved.

Existing fault location techniques in the literature can be divided into several categories, namely, impedance-based methods [1, 2, 3], voltage sag-based methods [4, 5, 6], automated outage mapping [6, 7, 8], traveling wave-based methods [9, 10], and machine learning-based methods [11, 12, 13, 14]. Impedance-based fault location methods use voltage and current measurements to estimate fault impedance and fault location. Specifically, a generalized fault location method for overhead distribution system is proposed in [1]. Substation voltage and current quantities are expressed as functions of the fault location and fault resistance, thus the fault location can be determined by solving a set of nonlinear equations. To solve the multiple estimation problem, it is proposed to use estimated fault currents in all phases including the healthy phase to find the faulty feeder and the location of the fault [2]. It is pointed out in [15] that the accuracy of impedance-based methods can be affected by factors including fault type, unbalanced loads, heterogeneity of overhead lines, measurement errors, etc.

When a fault occurs in a distribution system, voltage drops can occur at all buses. The voltage drop characteristics for the whole system vary with different fault locations. Thus, the voltage measurements on certain buses can be used to identify the fault location. For instance, calculated fault currents can be applied to each bus in the system, and the values of voltage drop on a small number of buses can be obtained by calculating the power flows. The fault location can then be determined by comparing measured and calculated values of voltage drop [4, 5]. In [6], multiple estimations of fault current at a given bus are calculated using voltage drop measurements on a small number of buses, and a bus is identified as the faulty bus if the variance of the multiple fault current estimates takes the smallest value.

Automatic outage mapping refers to locating a fault or reducing the search space of a fault using information provided by devices that can directly or indirectly indicate the fault location. For example, when a fault occurs, if an automatic recloser is disconnected, smart meters downstream of the device would experience an outage. Smart meters downstream of the fault itself will also feature a loss of power. Thus, the search space of the fault can be greatly reduced if the geographic location of each smart meter is considered [6]. Authors in [7] proposed to use fault indicators to identify the fault location. Each fault indicator can tell whether the fault current flows through itself (it may also have the ability to tell the direction of the fault current). The location of the fault can then be narrowed down to a section between any two fault indicators. An integer programming-based method is proposed in [8] to locate a fault using information from circuit breakers, automatic reclosers, fuses, and smart meters. Multiple fault scenarios, malfunctioning of protective devices, and missing notifications from smart meters are also taken into consideration.

Traveling wave-based methods use observation of original and reflected waves generated by a fault. Specifically, different types of traveling wave methods include single-ended, double-ended, injection-based, reclosing transient-based, etc. The principle and implementation of single-ended and double-ended fault location with traveling waves are discussed in [9]. The traveling wave generated by circuit breaker reclosing is used to locate faults in [10]. In general, however, traveling wave-based methods require high sampling rates and communication overhead of measurement devices [5]. Systems such as the global positioning system (GPS) are required for time synchronization across multi-terminal signals.

Machine learning models are leveraged for fault location in distribution systems [16]. Using the spectral characteristics of post-fault measurements, data with feature extraction are fed into an artificial neural network (ANN) for fault location [13]. A learning algorithm for multivariable data analysis (LAMDA) is used in [12] to obtain fault location. Descriptors are extracted from voltage and current waveforms measured at the substation. Various LAMDA nets are trained for different types of faults. In [11], the authors first use support vector machines (SVM) to classify the fault type, and then use ANN to identify the fault location. Smart meter data serves as the input of a multi-label SVM to identify the faulty lines in a distribution system [14].

The deployment of distribution system measurement devices or systems such as advanced metering infrastructure [14], micro phasor measurement units [17], and wireless sensor networks [18] improves data-driven situational awareness for distribution systems [19, 20]. There are two major challenges for fault location in distribution systems with the increased number of measurements available: first, traditional fault location methods are unable to incorporate the measurements from different buses in a flexible manner, especially when the losses of data are taken into consideration. Second, for traditional machine learning approaches, the topology of the distribution network is hard to model, let alone the possibility of topology changes.

Recent advances in the field of machine learning, especially deep learning, have gained extensive attentions from both academia and industry. One of the major developments is the successful implementation of convolutional neural networks (CNN) in a variety of image recognition-related tasks [21]. While the measurements on different buses in a power distribution system are spatially distributed, it is hard to directly implement a CNN model that use such measurements as input. Nevertheless, when multiple buses in a distribution system become measurable, it is possible to treat the measurements as signals on a graph to which variants of traditional data analysis tools may be applicable [22, 23]. As an extension of CNNs for data on graphs, graph convolutional networks (GCN) have been designed and implemented, such that the advantages of CNNs can be exploited for data residing on graphs [24, 25, 26].

In this paper, a GCN model is proposed for fault location in distribution systems. Unlike existing machine learning models used for fault location tasks, the architecture of the proposed model preserves the spatial correlations of the buses and learns to integrate information from multiple measurement units. Features are extracted and composited in a layer-by-layer manner to facilitate the faulty bus classification task. We also design a data augmentation procedure to ensure that the model is robust to varied levels of noise and errors. In addition, the proposed model can be readily adapted or extended to various tasks concerning data processing for multiple measurements in modern smart grids.

The organization of the rest of the paper is as follows: in Section II, the fault location task is formulated and the proposed GCN model is described in detail. We also introduce the IEEE 123 bus test case used in this paper. The effectiveness of the proposed GCN model is validated in Section III with extensive comparisons and visualizations. A data augmentation procedure for training robust models is introduced. The performance of the model under topology changes and on high impedance faults is evaluated. We also implement the GCN model on another distribution network test case and discuss several practical concerns. Finally, Section IV concludes the paper and points out some future works.

II Fault Location Based on Graph Convolutional Networks

In this section, we first give a brief description of the fault location task. Next, we will revisit idea of spectral convolution on graphs, and show how a GCN can be constructed based on that idea. Finally, we will present the test case of the IEEE 123 bus distribution system.

II-A Formulation of the Fault Location Task

In this paper, we assume that the voltage and current phasor measurements are available at phases that are connected to loads. That is, for a given measured bus in a distribution system, we have access to its three-phase voltage and current phasors $(V_{1},\theta^{V}_{1},V_{2},\theta^{V}_{2},V_{3},\theta^{V}_{3},I_{1},\theta^{I}_{1},I_{2},\theta^{I}_{2},I_{3},\theta^{I}_{3})\in\mathbb{R}^{12}$ . Values corresponding to unmeasured phases are set to zero. A data sample of measurements from the distribution system can then be represented as $\mathbf{X}\in\mathbb{R}^{n_{o}\times 12}$ , where $n_{o}$ is the number of observed buses. We formulate the fault location task as a classification problem. More specifically, given a data sample matrix $\mathbf{X}_{i}$ , the faulted bus $\tilde{y_{i}}$ is obtained by $\tilde{y}_{i}=f(\mathbf{X}_{i})$ , where $f$ is a specific faulty bus classification model. A fault is correctly located if $\tilde{y}_{i}=y_{i}$ , where $y_{i}$ indicates the true faulty bus corresponding to $\mathbf{X}_{i}$ .

As the convolution operation of CNN is carried out in local regions within the input data, local features can be extracted, and complex structures within the data can be represented with the increase of convolution layers (for a detailed description of CNN, the readers may refer to [27]). However, traditional CNN models can not be applied to signals on a distribution network as the inputs for CNN are supposed to be in Euclidean domains, such as images represented by values on regular two-dimensional grids and sequential data that is one-dimensional [28]. Thus, we introduce how a convolutional network can be constructed with signals on graphs hereinafter.

II-B Spectral Convolution on Graphs

To be self-contained, we first present a brief introduction to spectral graph theory [29]. Suppose we have an undirected weighted graph $\mathcal{G}=(\mathcal{V},\mathcal{E},\mathbf{W})$ , where $\mathcal{V}$ is the set of vertices with $|\mathcal{V}|=n$ , $\mathcal{E}$ is the set of edges, and $\mathbf{W}\in\mathbb{R}^{n\times n}$ is the weighted adjacency matrix. The unnormalized graph Laplacian of $\mathcal{G}$ is defined as $\mathbf{\Delta}_{u}=\mathbf{D}-\mathbf{W}$ , where $\mathbf{D}$ is the degree matrix of the graph with diagonal entries $\mathbf{D}_{ii}=\sum_{j}\mathbf{W}_{ij}$ . Then, the normalized graph Laplacian is given as

[TABLE]

where $\mathbf{I}$ is the identity matrix. The eigendecomposition of the positive semi-definite symmetric matrix $\mathbf{\Delta}$ yields $\mathbf{\Delta}=\mathbf{\Phi}\mathbf{\Lambda}\mathbf{\Phi}^{\top}$ , where $\mathbf{\Phi}=(\bm{\phi}_{1},\dots,\bm{\phi}_{n})$ are orthonormal eigenvectors of $\mathbf{\Delta}$ , and $\mathbf{\Lambda}=\mathrm{diag}(\lambda_{1},\dots,\lambda_{n})$ is the diagonal matrix with corresponding ordered non-negative eigenvalues $0=\lambda_{1}\leq\lambda_{2}\leq\dots\leq\lambda_{n}$ . Note that the smallest eigenvalue $\lambda_{1}$ equals zero with the eigenvector $\bm{\phi}_{1}=(\frac{1}{\sqrt{n}},\cdots,\frac{1}{\sqrt{n}})$ . By analogy with the Fourier transform in Euclidean spaces, graph Fourier transform (GFT) can be defined for weighted graphs using the orthonormal eigenvectors of $\mathbf{\Delta}$ [30]. For a signal $\mathbf{f}\in\mathbb{R}^{n}$ on the vertices of graph $\mathcal{G}$ (each vertice has one value in this case), GFT is performed as $\mathbf{\hat{f}}=\mathbf{\Phi}^{\top}\mathbf{f}$ while the inverse GFT is $\mathbf{f}=\mathbf{\Phi}\mathbf{\hat{f}}$ . Further, we can conduct convolution on graphs in the spectral domain also by analogy with convolution on discrete Euclidean spaces facilitated by Fourier transform. That is, spectral convolution of two signals $\mathbf{g}$ and $\mathbf{f}$ is defined as

[TABLE]

where $\circ$ indicates element-wise product between two vectors. Filtering of signal $\mathbf{f}$ by spectral filter $\mathbf{\mathbf{B}}=\mathrm{diag}(\bm{\beta})$ with $\bm{\beta}\in\mathbb{R}^{n}$ can then be expressed as $\mathbf{\Phi}\mathbf{\mathbf{B}}\mathbf{\Phi}^{\top}\mathbf{f}$ . One major drawback of this formulation, however, is that the filters are not guaranteed to be spatially localized, which is a crucial feature of CNNs for data in Euclidean spaces, since localized filters are able to extract features from small areas of interest instead of the whole input. Using filters $h_{\bm{\alpha}}(\mathbf{\Lambda})$ that are smooth in spectral domain can bypass such an issue [31, 29]. For example, consider using a polynomial approximation

[TABLE]

where $\bm{\alpha}=(\alpha_{1},\dots,\alpha_{K})$ is the vector of coefficients to be learned for the filters and $K$ is the degree of the polynomials. Further, in order to stabilize the training of the polynomial filters, the truncated Chebyshev polynomial expansion of $h_{\bm{\alpha}}(\mathbf{\Lambda})$ is introduced [30, 25]. Specifically, expansion of $h_{\bm{\alpha}}(\mathbf{\Lambda})$ using Chebyshev polynomials $\mathit{T}_{k}(\tilde{\mathbf{\Lambda}})$ up to order $K$ can be expressed as

[TABLE]

where $\tilde{\mathbf{\Lambda}}=2\mathbf{\Lambda}/\lambda_{n}-\mathbf{I}$ . The recursive formulation of the filtering process based on Chebshev polynomials is introduced in [25], which takes the form $\mathit{T}_{k}(x)=2x\mathit{T}_{k-1}(x)-\mathit{T}_{k-2}(x)$ with $\mathit{T}_{0}=1$ and $\mathit{T}_{1}=x$ . Since $\mathbf{\Delta}^{k}=(\mathbf{\Phi}\mathbf{\Lambda}\mathbf{\Phi}^{\top})^{k}=\mathbf{\Phi}\mathbf{\Lambda}^{k}\mathbf{\Phi}^{\top}$ , the filtering process $\mathbf{\Phi}h_{\bm{\alpha}}(\mathbf{\Lambda})\mathbf{\Phi}^{\top}\mathbf{f}$ can be expressed as

[TABLE]

where $\tilde{\mathbf{\Delta}}=2\mathbf{\Delta}/\lambda_{n}-\mathbf{I}$ . Consequently, with $\mathbf{d}_{0}=\mathbf{f}$ and $\mathbf{d}_{1}=\tilde{\mathbf{\Delta}}\mathbf{f}$ , we can recursively calculate $\mathbf{d}_{k}=2\tilde{\mathbf{\Delta}}\mathbf{d}_{k-1}-\mathbf{d}_{k-2}$ , and the filtering operation $h_{\bm{\alpha}}(\mathbf{\Delta})\mathbf{f}=[\mathbf{d}_{0},\cdots,\mathbf{d}_{K}]\bm{\alpha}$ has a computational complexity of $\mathcal{O}(K|\mathcal{E}|)$ considering the sparsity of $\mathbf{\Delta}$ [25]. In addition, because the Chebyshev polynomials are truncated to the $K$ th order, the filter is $K$ -hop localized with respect to the connections embodied in $\mathbf{\Delta}$ . To this end, GCN can be implemented with the aforementioned spectral convolution on graphs.

II-C GCN Approach for Fault Location

The GCN model applied to the fault location task is illustrated in Fig. 1. The input $\mathbf{X}$ is passed through $L_{c}$ graph convolution layers and $L_{f}$ fully-connected layers followed by a softmax activation function. Specifically, the $j$ th feature map of a graph convolution layer is calculated as

[TABLE]

where $\mathbf{x}_{i}\in\mathbb{R}^{n}$ is the $i$ th input feature map, $\bm{\alpha}_{i,j}\in\mathbb{R}^{K}$ is the trainable coefficients, and $N_{in}$ is the number of filters of the previous layer. With $N_{out}$ filters in the current layer, a total of $N_{in}N_{out}K$ parameters are trainable in this layer. In particular, $N_{in}=12$ for the first layer of the model. The output of the last graph convolution layer is flattened into a vector and passed to the fully-connected layers. The index of the predicted faulty bus, $\tilde{y}$ , can be obtained as $\tilde{y}=\operatorname*{argmax}_{i}a_{i}$ , where $a_{i}$ is the $i$ th activation of the last fully-connected layer.

The weighted adjacency matrix is constructed based on the physical distance between the nodes. First, the distance matrix $\mathbf{S}\in\mathbb{R}^{n\times n}$ is formed with $\mathbf{S}_{ij}$ being the length of the shortest path between bus $i$ and bus $j$ . We then sort and keep the smallest $K_{n}$ values in each row of $\mathbf{S}$ to obtain $\tilde{\mathbf{S}}\in\mathbb{R}^{n\times K_{n}}$ and calculate $\sigma_{S}=\sum_{i}{\tilde{\mathbf{S}}_{iK_{n}}}/n$ (we have $\tilde{\mathbf{S}}_{ij}\leq\tilde{\mathbf{S}}_{ik}$ for $j<k$ ). Matrix $\tilde{\mathbf{W}}\in\mathbb{R}^{n\times K_{n}}$ is then constructed with $\tilde{\mathbf{W}}_{ij}=e^{-\tilde{\mathbf{S}}_{ij}^{2}/\sigma_{S}^{2}}$ . By restoring the positional correspondence of $\tilde{\mathbf{W}}_{ij}$ to bus $i$ and bus $j$ , the weighted adjacency matrix $\mathbf{W}\in\mathbb{R}^{n\times n}$ can be obtained. We can thus proceed to compute $\mathbf{D}$ and finally obtain $\mathbf{\Delta}$ according to (1).

II-D The IEEE 123 Bus Distribution System Test Case

The IEEE 123 bus test case is used to carry out the task of fault location in distribution systems in this paper [32]. The overall topology of the distribution system is illustrated in Fig. 2. Note that the topology is only used to indicate the connections of the buses rather than their geometrical locations. Specifically, there are 128 buses in the system (cf. Fig. 2), 85 of which are connected to loads. Most of those loads are only connected to a single phase. Bus pairs (149, 150r), (18, 135), (13, 152), (60, 160(r)), (61, 61s), and (97, 197) are connected by normally closed switches. In addition, regulators are installed at buses 9, 25, and 160.

In order to generate the training and test datasets, faults are simulated for all buses in the system. Three types of faults are considered, namely, single phase to ground, two phase to ground, and two phase short-circuit. The faults have the resistance ranging from 0.05 $\Omega$ to 20 $\Omega$ . The load level of the system varies between 0.316 and 1. In order Fig. 3 shows the discrete probability density function (PDF) with 50 equal-length load level intervals. The PDF is obtained from the annual load curve of the system. We randomly sample one value from the load level distribution and set all loads in the system to the same level. The simulations are implemented by the OpenDSS software [33]. The voltage and current phasors are measured during the fault. We obtain the training and test datasets used for training and evaluating the fault location models.

We generate 20 data samples for each fault type at each bus. As a result, a total of 13520 data samples are generated for both the training and test datasets. We consider buses connected with normally closed switches or regulators as a single bus. Thus, there are a total of 119 faulty buses to be classified; i.e., 119 class labels for the classification task.

For the implementation of the GCN model111The implementation of GCN in this paper is based on the implementation in [25]; see https://github.com/mdeff/cnn_graph, instead of using $n_{o}\times 12$ as the size of the input of the model, we expand $\mathbf{X}$ to include all 128 buses, i.e., each input data sample has a size of $128\times 12$ . As a result, each sample matrix has 1536 entries, 380 of which have measured values. For the non-measured buses, we set the corresponding values to be zero. The same measured quantity is run through the standardization process; i.e., subtracting the mean and dividing by the standard deviation.

III Results and Discussion

In this section, we report the performance of GCN for fault location tested in the IEEE 123 bus benchmark system. Comparisons with baseline models are provided in detail. We also visualize the hidden features of samples in the test dataset to demonstrate that the proposed GCN model is able to learn more robust representations from data.

III-A Implementation Details and Baseline Models

The hyper-parameters of the GCN model implemented in this paper are determined using 10% of the training dataset as the validation dataset. Specifically, the model has 3 graph convolution layers (all with 256 filters) followed by 2 fully-connected layers (with 512 and 256 hidden nodes). $K_{n}$ is set to 20 and the values of $K$ for the graph convolution layers are 3, 4, and 5, respectively. The two fully-connected layers have a dropout rate of 0.5. The Adam optimizer with an initial learning rate of 0.0002 is used to train the model for 400 epochs (i.e., each data sample is used 400 times for training) and a mini-batch size of 32. We use Tensorflow in Python to implement the GCN model. When trained with a Titan Xp GPU, the GCN model takes less than 2 hours to train, and the time used to test each sample in the test dataset is less than 0.5 ms.

We first visualize $\mathbf{\Delta}^{m}$ with different values of $m$ to illustrate the locality of the spectral filters, the results of which are shown in Fig. 4 and Fig. 5. In Fig. 4, we illustrate the support of a filter when $m$ ranges from 1 to 4 (when $m=5$ , the support of filters becomes the whole graph). In Fig. 4, the absolute values of the entries in $\mathbf{\Delta}^{m}$ are visualized. Although the size of filters grows fast with the increase of $m$ , we can observe in Fig. 5 that relatively large absolute values in $\mathbf{\Delta}^{m}$ are mainly limited to entries corresponding to bus pairs that are close to each other. Since the filters can be represented as polynomials of $\mathbf{\Delta}$ , we conclude that the locality of filters are ensured when the value of $K_{n}$ is chosen properly. Note that higher-order terms in the polynomials facilitate the filters to explore more nodes in the graph.

Three baseline models are also implemented for comparison:

SVM: The dimensionality of the measurements is reduced to 200 by principal component analysis (PCA). The radial basis function (RBF) kernel is used for the SVM with $\gamma=0.002$ and $C=1.5\times 10^{6}$ . LibSVM [34] in Python is used for the implementation in this paper. 2. 2.

Random forest (RF): The dimensionality of the measurements is also reduced to 200 by PCA. The number of trees is set to 300, the minimal number of samples per leaf is 1, while the minimal number of samples required for a split is set to 3. 3. 3.

Fully-connected neural network (FCNN): A three-layer FCNN is implemented as a vanilla baseline of neural networks. The numbers of hidden neurons for the three layers are 256, 128, and 64, respectively. Scaled exponential linear unit (SELU) is used as the activation function.

The hyper-parameters for SVM and RF are determined by 5-fold cross-validation. For the FCNN model, 10 $\%$ of the training data is used to validate the hyper-parameters.

In order to justify the effectiveness of our proposed approach in real-world conditions including noise, measurement errors and communication errors, we add noise and errors to the measurements and compare the performance of different models. More specifically, three types of modifications of measurements are added:

Gaussian noise: We add Gaussian noises to the data so that the signal to noise rate (SNR) is 45 dB, as introduced in [35]. The noise has zero mean and the standard deviation, $\sigma_{\rm{noise}}$ , is calculated as $\sigma_{\rm{noise}}=10^{-\frac{\text{SNR}}{20}}$ . 2. 2.

Data loss of buses: We randomly drop the data of $N_{\rm{drop}}$ buses (i.e., set the measured values to 0) per data sample in the test dataset. 3. 3.

Random data loss for measured data: Each measurement at all buses is replaced by 0 with a probability $P_{\rm{loss}}$ .

More specifically, we set $\sigma_{\rm{noise}}=10^{-\frac{45}{20}}$ , $N_{\rm{drop}}=1$ and $P_{\rm{loss}}=0.01$ throughout the experiments unless otherwise specified. The detailed performance comparisons are given in the ensuing subsections.

III-B Fault Location Performance of the Models

The fault location accuracies of various approaches are presented in Table I. In addition to the traditionally defined classification accuracy, we also use one-hop accuracy as a metric to measure the performance of the models. Specifically, a sample is considered correctly classified if the predicted faulty bus is directly connected to the actual faulty bus. For the GCN model, we repeat the trials three times and report the mean of the accuracy values.

In Table I, it is shown that the GCN model has the highest classification accuracy. SVM and RF (both with PCA) also have good performance, especially for one-hop accuracy. The accuracy obtained by FCNN is relatively low, but its one-hop accuracy is still satisfactory.

The performance of the models with measurement modifications on the test dataset are shown in Table II. Results corresponding to the individual and combined modifications are reported therein. A major observation is that the two data loss errors greatly lower the classification accuracy of the models. Nevertheless, the GCN model is quite robust to various modifications and significantly outperform other schemes. In addition, the FCNN model has higher accuracy than SVM and RF when data loss errors are involved, even though its classification accuracy is roughly 10% lower than those two models.

A more realistic setting is adding Gaussian noise to the data samples in the training dataset and observe the performance of the models. Table III gives the results of fault location accuracy corresponding to such a setup. The results for SVM and FCNN are in general consistent with the accuracy values in Table II. For RF, however, the accuracies for modifications including data loss errors all increase by more than 10%. Mild improvements are also observed for GCN. In summary, the GCN model has superior performance when measurement modifications are added to the data. Note that the modifications with data loss errors are not taken into account in the training phase. This indicates that the robustness of the GCN model may be generalizable to other types of errors in the data. In all subsequent experiments in the paper, the samples in both training and test datasets are added with Gaussian noise of 45 dB unless otherwise stated.

In the next subsection, we visualize the data upon the transformation by the FCNN model and the GCN model. Such visualizations facilitate our understanding of the performance differences induced by various schemes.

III-C Visualization of Data After Transformations

Visualizing transformed data in two-dimensional spaces enables assessment of the ability of the models to extract useful information from the input data. In this paper, we use t-distributed stochastic neighbor embedding (t-SNE) with two components to visualize high-dimensional data [36]. Specifically, t-SNE is used to investigate the local structure of the input data (i.e., normalized raw measurements), the data transformed by FCNN, and the data transformed by GCN. In particular, we are interested in studying how closely the samples corresponding to the same faulty bus are distributed.

In Fig. 6, we visualize the data samples in the test dataset with t-SNE after the dimensionality of data is reduced to 200 by PCA, which is also used to speed up the calculation process of t-SNE). In order to highlight the distribution of data belonging to the same class (faulty bus), 6 groups of data samples of bus 1, 21, 66, 85, 111, and 250 are marked with colors. Data samples of other buses are plotted as the gray dots. It can be seen in the figure that the dots of different colors scatter around such that it is hard to separate the data samples from different classes.

We then visualize the data samples in the test dataset after they are transformed by the FCNN and the GCN models, as shown in Fig. 7. Both models are trained with added Gaussian noise while the test data is also added with Gaussian noise. We extract the data from the outputs of the fully-connected layer right before the final output layer. For the FCNN model, each data sample is 64-dimensional, while the dimensionality of data samples is 256 for the GCN model. In Fig. 7 (a), the data samples of the same class hardly cluster together, except for the dark green dots in the upper-right corner. In Fig. 7 (b), however, most samples of the same color appear closely together, except that only a small fraction of blue dots are separated from its main cluster. Note that the visualization in Fig. 7 corresponds to the “Noise” column of Table III. That is, the improved feature extraction capability of the GCN model gives a performance boost in classification accuracy of more than 10%.

As shown in Table II and Table III, the two types of data loss errors have significant impact on the classification performance of all models. Thus, in Fig. 8 we proceed to visualize the data samples that are added with Gaussian noise and two types of data loss errors. A lot of small sample clusters of the six colored faulty buses can be seen at multiple locations in Fig. 8 (a), which indicates that the FCNN model has difficulty in generalizing its feature extraction capability to the data modified with the two types of data loss errors. On the contrary, the GCN model still preserves the structures of the data to a large extend. The proportion of data samples that are separated from the main clusters is relatively small. Such a capability of preserving data structure gives rise to more than 30% performance gain for the proposed GCN, as shown in the last column of Table III.

III-D Increasing Model’s Robustness by Data Augmentation

We have shown that the GCN model is quite robust to mild noise and data loss errors (i.e., the SNR is 45 db, $N_{\rm{drop}}=1$ , and $P_{\rm{loss}}=0.01$ ). The data collected from the field, however, may have lower SNR and higher data loss rates. Thus, it is desirable if the model is able to generalize to different levels of noise and data loss errors. In light of this, we implement data augmentation during training of the model by adding various levels of noise and data loss errors to the input data. Specifically, for the $i$ th input sample in a mini-batch, we first add Gaussian noises with $\sigma_{\rm{noise}}=\tilde{\sigma}_{i}$ to the measurements and randomly set measurements to 0 with $N_{\rm{drop}}=\tilde{n}_{i}$ and $P_{\rm{loss}}=\tilde{p}_{i}$ . We randomly choose $\tilde{\sigma}_{i}$ , $\tilde{n}_{i}$ , and $\tilde{p}_{i}$ from [0, $10^{-\frac{45}{20}}$ , $10^{-\frac{40}{20}}$ , $10^{-\frac{35}{20}}$ , $10^{-\frac{30}{20}}$ , $10^{-\frac{25}{20}}$ ], [0, 1, 2, 3, 4, 5], and [0, 0.01, 0.02, 0.03, 0.04, 0.05], respectively, with equal probability. Note that the data augmentation is applied to each mini-batch, thus a new data sample is generated for the $j$ th data sample in each epoch unless $\tilde{\sigma}_{j}=\tilde{n}_{j}=\tilde{p}_{j}=0$ , in which case the data sample is unchanged.

We report the performance of the GCN model under various noise and data loss levels with and without data augmentation in Fig. 9. Specifically, case 1 has the lowest level of noise and data loss errors while case 5 has the highest level. It is shown in the figure that the proposed data augmentation procedure greatly improves the fault location accuracies, and the one-hop accuracies for cases 1, 2, and 3 are higher than 95%. The one-hop accuracy when data augmentation is applied is higher than 84% even for case 5, for which the SNR is 25, measurements from 5 buses are lost, and each measurement may also be lost with a probability of 0.05.

Although the accuracy with data augmentation for case 1 is quite high, some samples are still assigned to wrong buses. In order to examine the characteristics of misclassified samples, we collect the samples with predicted faulty buses more than two hops away from the correct faulty buses. Note that random noise and data losses are added to the test samples, so the results are different for each trial. Specifically, the collected bus pairs are (25r, 250), (30, 25r), (33, 25r), (53, 56), (61s, 62), (61s, 68), (89, 79), (92, 95), (99, 105), (108, 102), (151, 49), (250, 28), and (250, 25r) (the first number is the correct bus). The majority of the two buses in the bus pairs are three hops away. In addition, 4 buses, namely, 29, 34, 76, and 108 are used to illustrate the characteristics of misclassified samples for case 5. The buses are located near the four corners of the network illustrated in Fig. 2. While bus 34 has no misclassified samples, the lists of wrong predictions more than two hops away for bus 29, 76, and 108 are [18 (5), 21 (4), 22 (5), 23 (3), 26 (3), 27 (4), 31 (4)], [60 (3), 61s (4), 66 (8), 74 (3), 75 (4), 79 (3), 81 (4), 83 (6), 97 (3), 99 (5)], and [61s (6), 97 (3), 98 (4), 100 (6), 111 (3), 450 (7)], respectively (the number of hops between the correct and predicted buses are included in the brackets). While hop numbers up to 8 exist in the results, most of the hop numbers are lower than 5. As the two-hop accuracy for case 5 is 92.49%, it can be concluded that the GCN model is able to locate a fault within the vicinity of its exact location under severe data loss errors in most cases.

In previous experiments, the values for the hyper-parameters $K_{n}$ and $K$ are 20 and [3, 4, 5] (the $K$ for 3 graph convolution layers), respectively. It is expected that when data losses occur in some of the measurements close to a fault, information from other measurements may help the model locate the fault. In order to justify the choice of $K_{n}$ and $K$ , we report the performance of the GCN model with relatively small values of $K_{n}$ and $K$ in Fig. 10. Specifically, for one experiment we change $K_{n}$ to 5, and for the other experiment we set $K$ to [1, 1, 1]. It can be observed that the GCN model with the original hyper-parameters has higher accuracies. In addition, setting $K$ to [1, 1, 1] hurts the performance of the model as it severely limits the range of measurements a node in the GCN can obtain information from.

III-E Performance Under Distribution Network Reconfiguration

The configuration of a distribution network may change in order to reduce loss or balance the loads [37]. The reconfiguration of a network can be achieved by opening some of the normally closed switches and closing some of the normally open switches. The IEEE 123 bus system has a three-phase normally open switch between node 151 and node 300 (see Fig. 2), which can be used to guarantee electricity supply of the system if some of the normally closed switches are open.

In this work, we consider two cases of network reconfiguration:

Open the switch connecting node 18 and node 135, and close the switch connecting node 151 and node 300. 2. 2.

Open the switch connecting node 97 and node 197, and close the switch connecting node 151 and node 300.

In order to evaluate the performance of the proposed model under network reconfiguration, we generate 5 samples for each fault type at each bus for the two cases and directly use the GCN model trained with data augmentation to identify the faulty buses. As a result, the fault location accuracies and one-hop accuracies for the two cases are 88.37% (98.28%), and 91.89% (98.61%), respectively. As the reconfiguration scenarios are not considered during the training phase of the model, the results indicate that the GCN model has high stability against unseen network reconfiguration scenarios.

III-F Performance Under Multiple Connection Scenarios of Branches

In this subsection, the performance of the models under multiple connection scenarios of several branches is examined. Specifically, the connected phase of a branch in a distribution network may change from time to time and it is expected that the model can deal with such changes. Three branches in the IEEE 123 bus system are considered:

The branch connecting bus 36, 38, and 39. 2. 2.

The branch connecting bus 67 to 71. 3. 3.

The branch connecting bus 108 to 114.

Phase 1 and 2 of bus 36, and all three phases of bus 67 and bus 108 are connected to the distribution system. The buses on the branches use only one of the phases for connection.

Unlike network reconfiguration achieved by opening and closing of switches, changing the connected phase of a single-phase branch requires additional data to train the models. We implement a simple data generation process in order to add data with changed phases of aforementioned branches into the training and test datasets. Specifically, we change the phase of only one of the branches to another available phase and generate 5 data samples for each fault type at each bus. Thus, both the new training and test datasets contain 30420 data samples.

Fault location accuracies of the models with additional phase-changed data are presented in Table IV. The results for faults at all buses and at modified buses with changed phases are included. For all schemes, the accuracies for faults at modified buses are lower than the counterparts of faults at all buses. The GCN model has the highest accuracies for all scenarios while the one-hop accuracies are more than 99%. Comparing the column of “All Buses” of Table IV with the “Noise” column of Table III, we can see that the additional data has almost no impact on GCN, while the accuracies for other models decrease by 1-5%. Thus, we conclude that the GCN model is robust to the change of connected phases of single branches if the training dataset covers samples of the additional connection scenarios.

III-G Performance on High Impedance Faults

The detection of high impedance faults in distribution networks is a challenging task as the current magnitude is generally close to the level of load current [38]. In this subsection, we evaluate the performance of the GCN model on locating high impedance faults. Specifically, we add single-phase-to-ground faults with high fault resistance to the training dataset and report the fault location results on various ranges of fault resistance. For the construction of the training dataset, in addition to the data samples with small fault resistance values, we generate 40 samples for each phase at each bus and the fault resistance is uniformly sampled between 100 $\Omega$ and 5000 $\Omega$ . Five fault resistance ranges, namely, 100 $\Omega$ to 1000 $\Omega$ , 1000 $\Omega$ to 2000 $\Omega$ , 2000 $\Omega$ to 3000 $\Omega$ , 3000 $\Omega$ to 4000 $\Omega$ , and 4000 $\Omega$ to 5000 $\Omega$ are used to generate test samples. For each range, 5 test samples are generated for each type of fault at each bus. In order to test the generalizability of the GCN model, we further split the fault resistance ranges into two sets of intervals (the length of each interval is 10 $\Omega$ ), namely, $\{R_{1}|20k<R_{1}<20k+10,k\in\mathbb{Z}:5\leq k\leq 249\}$ , and $\{R_{2}|20k+10<R_{2}<20(k+1),k\in\mathbb{Z}:5\leq k\leq 249\}$ , where $R_{1}$ is used for samples in the training dataset and $R_{2}$ is used to generate samples in the test datasets.

The accuracies for high impedance faults with different ranges of fault resistance are shown in Fig. 11. As the zero-hop and one-hop accuracies are relatively low, we also report the two-hop and three-hop accuracies. It is seen in the figure that with the increase of fault resistance, the fault location accuracy drops. Although the increased fault resistance makes it hard to find the exact fault location, the accuracies increase rapidly with the increase of the number of hops, which indicates that the model is still able to capture a part of fault characteristics for high impedance faults.

III-H Discussion on the Types of Measurements Used for the Model

As the proposed GCN model uses amplitudes and phase angles of both voltage and current measurements as inputs, it is necessary to examine the contributions of the different measurements to the performance of the model. Specifically, the first concern is that the measured currents are the injected currents at the loads, which provide less information about the faults compared with currents flowing in the branches connecting the buses. The second concern is the contribution of phase angle to the fault location accuracy as measuring the phase angle requires additional installation of phasor measurement devices.

In light of the concerns, we compare the performance of the GCN model under different measurement scenarios and present the results in Table V. The results for the last row are the same as the row of GCN in Table III. It is shown in the table that the results with voltage phasors are quite similar to the results with both voltage and current phasors. The results with current phasors, however, is much lower than the results with voltage phasors. Further, when only voltage amplitudes are used, the accuracies with data loss errors are dramatically lower than other scenarios. Two conclusions can be drawn from the results:

For the design of the GCN model in this paper, the performance of the model mainly relies on the voltage phasors. Other types of current measurements such as currents flowing in the branches may be added in order to improve the fault location accuracy. 2. 2.

It is important to include phase angles in the inputs for the GCN model, especially when data loss errors are considered.

III-I Implementation of the GCN Model on Another Distribution Network

As the previous experiments are all carried out in the IEEE 123 bus system, we implement the GCN model to the IEEE 37 bus system to verify that the model can perform well in a new distribution network. The topology of the IEEE 37 bus system is shown in Fig. 12. Similar to the implementation for the IEEE 123 bus system, we measure the voltage and current phasors at the phases connected to loads. The generation scheme for training and test datasets described in Section II. D is used. A series of hyper-parameters are used to see if the performance of the model is sensitive to the choice of hyper-parameters.

We first evaluate the performance of the GCN model with different values of $K_{n}$ and $K$ (the other hyper-parameters remain unchanged), and the accuracies are shown in Table VI. For each column in the table, the lowest two values are highlighted with underlines while the highest two values are marked in bold. It is observed that the performance of the GCN model is quite stable under different values of $K_{n}$ and $K$ . In Section III. D, we have shown that properly choosing the values of $K_{n}$ and $K$ can increase the model’s robustness against data loss errors. When data loss errors are not considered, however, the negative effect of setting $K_{n}$ and $K$ to small values is insignificant. This indicates that laborious tuning of hyper-parameters is not needed when implementing the GCN model to a new distribution network.

Another important hyper-parameter is the number of graph convolution layers. Although the GCN model does not require a large number of layers, it is expected that the model may not have enough learning capacity when the number of layers is not enough. We report the performance of the GCN model with different numbers of graph convolution layers in Table VII. It is clearly observed in the table that increasing the number of layers improves the fault location accuracy, but the gain of adding another layer decreases when the third layer is added.

Finally, we use the IEEE 37 bus system to discuss some practicability issues. The first concern is the number of measured buses required for the GCN model. In the original scenario, the phases connected to loads at 25 buses are monitored. We gradually reduce the number of monitored buses and compare the performance of the model under these reduction cases. Specifically, the lists of removed buses are [714, 733, 737, 738, 744], [720, 727, 734], [724, 728, 735], [701, 712, 713, 730, 741], and [718, 722, 729, 731, 736]. Although this is only one possibility of reduction, we try to remove the buses evenly across the network to avoid large areas of unmonitored buses. When the 4th list of buses are removed, only 9 buses at the end of branches are left. The final reduction case leaves the model with only 4 measured buses, namely, 725, 732, 740, and 742. The accuracies of the different reduction cases are illustrated in Fig. 13. Apparently, reducing the number of measured buses has a negative effect on the fault location accuracies. The two-hop accuracy, however, is not very sensitive to the reduction of measured buses until the last reduction. Four case 5, specifically, the two-hop accuracy is 94.12% with measurements from only 9 buses at branch ends. The results indicate that it is harder for the GCN model to find the exact fault locations when a large proportion of the measured buses are excluded, but the ability to find the vicinity of the faulty bus only requires a small proportion of buses to be measured.

The second concern is the number of training data needed to train the GCN model. The above mentioned dataset for the IEEE 37 bus system contains 20 samples for each fault type at each bus. With this as the size of 100%, we reduce the number of generated samples for each fault type at each bus to 10, 5, 2, and 1, and compare the accuracies of the scenarios in Table VIII. The size of the test dataset remain the same. It is seen in the table that the performance of the GCN model degrades as the size of the training dataset reduces. With only 1 sample for each fault type at each bus, the two-hop accuracy is a little above 50%. As it is hard to collect field data with varied fault types, fault resistances and load levels, a more practical solution is to combine field data with synthetic data simulated according to the need of the model. With the help of transfer learning [39], the model can transfer the knowledge learned from simulated data to locate faults using actual measurements from the system. Such a problem formulation is beyond the scope of this paper, but the results in our work provides an upper bound for the performance of the GCN model as we use simulated data only.

IV Conclusion and Future Work

In this paper, we develop a GCN model for the task of fault location in distribution systems. Simulation results tested with the IEEE 123-bus and 37-bus systems show that the proposed GCN model is significantly effective in processing fault-related data. The proposed model is more robust to measurement errors compared with many other machine learning approaches including SVM, RF, and FCNN. Visualization of the activations of the last fully-connected layer shows that the GCN model extracts features that are robust to missing entries in the measurements. Further experiments show that the model can adapt to topology changes and perform well with a limited number of measured buses. In a nutshell, the present paper proposes a flexible and widely-applicable energy data analytics framework for improving situational awareness in power distribution systems.

The proposed framework and approach open up a few interesting research directions. First, the effectiveness of the GCN model in more realistic settings needs further investigation (e.g., use field data to fine-tune the model trained with synthetic data, or train the model with both field data and synthetic data by transfer learning). Second, it is valuable to develop new schemes for transferring a learned model to other distribution systems with different topologies. A new challenge comes from the integration of distributed generation, which introduces high-level uncertainties into the grids, and may alter the characteristics of the measurements during faults.

Acknowledgement

The authors are grateful for the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this project. We would also like to thank Ron Levie at TU Berlin, and Federico Monti at University of Lugano, who helped us with the implementation of the GCN model.

Bibliography39

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Y. Liao, “Generalized fault-location methods for overhead electric distribution systems,” IEEE Transactions on Power Delivery , vol. 26, no. 1, pp. 53–64, Jan. 2011.
2[2] R. Krishnathevar and E. E. Ngu, “Generalized impedance-based fault location for distribution systems,” IEEE Transactions on Power Delivery , vol. 27, no. 1, pp. 449–451, Jan. 2012.
3[3] S. Das, N. Karnik, and S. Santoso, “Distribution fault-locating algorithms using current only,” IEEE Transactions on Power Delivery , vol. 27, no. 3, pp. 1144–1153, July 2012.
4[4] R. A. F. Pereira, L. G. W. da Silva, M. Kezunovic, and J. R. S. Mantovani, “Improved fault location on distribution feeders based on matching during-fault voltage sags,” IEEE Transactions on Power Delivery , vol. 24, no. 2, pp. 852–862, Apr. 2009.
5[5] S. Lotfifard, M. Kezunovic, and M. J. Mousavi, “Voltage sag data utilization for distribution fault location,” IEEE Transactions on Power Delivery , vol. 26, no. 2, pp. 1239–1246, Apr. 2011.
6[6] F. C. Trindade, W. Freitas, and J. C. Vieira, “Fault location in distribution systems based on smart feeder meters,” IEEE Transactions on Power Delivery , vol. 29, no. 1, pp. 251–260, Feb. 2014.
7[7] J.-H. Teng, W.-H. Huang, and S.-W. Luan, “Automatic and fast faulted line-section location method for distribution systems based on fault indicators,” IEEE Transactions on Power systems , vol. 29, no. 4, pp. 1653–1662, July 2014.
8[8] Y. Jiang, C.-C. Liu, M. Diedesch, E. Lee, and A. K. Srivastava, “Outage management of distribution systems incorporating information from smart meters,” IEEE Transactions on Power Systems , vol. 31, no. 5, pp. 4144–4154, Sept. 2016.