Supporting Future Electrical Utilities: Using Deep Learning Methods in EMS and DMS Algorithms
Ognjen Kundacina, Gorana Gojic, Mile Mitrovic, Dragisa Miskovic, Dejan, Vukobratovic

TL;DR
This paper reviews recent deep learning techniques for power system monitoring and optimization, addressing the challenges of increasing system complexity and renewable integration by enabling near real-time algorithms with lower computational demands.
Contribution
It provides a comprehensive review of deep learning applications in EMS and DMS, highlighting potential improvements for future electrical utility operations.
Findings
Deep learning enhances real-time power system monitoring.
Deep learning improves optimization in energy management.
Potential for re-implementing traditional algorithms with deep learning.
Abstract
Electrical power systems are increasing in size, complexity, as well as dynamics due to the growing integration of renewable energy resources, which have sporadic power generation. This necessitates the development of near real-time power system algorithms, demanding lower computational complexity regarding the power system size. Considering the growing trend in the collection of historical measurement data and recent advances in the rapidly developing deep learning field, the main goal of this paper is to provide a review of recent deep learning-based power system monitoring and optimization algorithms. Electrical utilities can benefit from this review by re-implementing or enhancing the algorithms traditionally used in energy management systems (EMS) and distribution management systems (DMS).
| Neural network layer type | Input data structure | Relational inductive bias | Property |
|---|---|---|---|
| Fully connected | Arbitrary | Input elements weakly related | - |
| Convolutional | Grids, images | Local relation | Spatial translation invariance |
| Recurrent | Sequences | Sequential relation | Time translation invariance |
| GNN layer | Graphs | Arbitrary relation | Permutation invariance |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEnergy Load and Power Forecasting · Electricity Theft Detection Techniques · Power System Optimization and Stability
Supporting Future Electrical Utilities: Using Deep Learning Methods in EMS and DMS Algorithms
††thanks: This work was supported by the Faculty of Technical Sciences in Novi Sad, Department of Power, Electronic and Telecommunication Engineering, within the implementation of the project entitled: ”Research aimed at improving the teaching process and development of scientific and professional areas of the Department of Power, Electronic and Telecommunication Engineering. Additionally, this work was supported by the European Union’s Horizon 2020 research and innovation programme under Grant Agreement number 856967.
Ognjen Kundacina
*The Institute for Artificial Intelligence *
*Research and Development of Serbia
*Novi Sad, Serbia
Gorana Gojic
*The Institute for Artificial Intelligence *
*Research and Development of Serbia
*Novi Sad, Serbia
Mile Mitrovic
*Skolkovo Institute of Science and Technology *
Moscow, Russia
Dragisa Miskovic
*The Institute for Artificial Intelligence *
*Research and Development of Serbia
*Novi Sad, Serbia
Dejan Vukobratovic
Faculty of Technical Sciences
*University of Novi Sad
*Novi Sad, Serbia
Abstract
Electrical power systems are increasing in size, complexity, as well as dynamics due to the growing integration of renewable energy resources, which have sporadic power generation. This necessitates the development of near real-time power system algorithms, demanding lower computational complexity regarding the power system size. Considering the growing trend in the collection of historical measurement data and recent advances in the rapidly developing deep learning field, the main goal of this paper is to provide a review of recent deep learning-based power system monitoring and optimization algorithms. Electrical utilities can benefit from this review by re-implementing or enhancing the algorithms traditionally used in energy management systems (EMS) and distribution management systems (DMS).
Index Terms:
Power Systems, Deep Learning, Energy Management System, Distribution Management System
I Introduction
Power systems are undergoing a transition due to the increased integration of renewable energy resources, and as a result they are facing new challenges in their operations. These challenges include the unpredictable nature of renewable energy sources, maintaining stability within the power system, managing the impacts of distributed generation, and the challenges presented by reverse power flows [1]. Consequently, the mathematical formulations of traditional algorithms that solve these problems have become increasingly complex and nonlinear, with larger dimensionality, making their practical implementation and real-time operation more challenging. These algorithms are usually implemented as parts of specialized software solutions, such as energy management systems (EMS) for transmission networks and distribution management systems (DMS) used in distribution networks, which are installed in power system control centres and used by power system operators on a daily basis. Some of the algorithms typically used as EMS and DMS functionalities include state estimation, fault detection and localization, demand and generation forecast, voltage and transient stability assessment, voltage control, optimal power flow, economic dispatch, etc. Increasing amounts of data generated by power systems [2] and collected by EMS and DMS are enabling the development of new deep learning-based algorithms to overcome the limitations of traditional ones.
Deep learning is a subfield of artificial intelligence that involves training neural network models to find patterns and make predictions based on the available set of data samples [3]. Some of the advantages of employing deep learning methods in the field of power systems include:
- •
Speed: Once trained, a deep learning algorithm usually operates quickly, even when processing large amounts of data [4]. This is crucial for applications where fast decision-making is required, as is the case in many power system operation problems.
- •
Accuracy: Universal approximation theorem [5] states that neural networks can approximate any function to a desired degree of accuracy, if it consists of a sufficient number of trainable parameters. Practically, this implies that neural networks can be employed to tackle a wide range of problems, including those in power systems, and that different network architectures and sizes can be used to adapt to the complexity of the problem.
- •
Adaptability: Deep learning methods are easily adaptable, meaning that they can be retrained when the underlying data generation process changes [6]. This makes them suitable for dynamic environments, such as when the power system’s operating conditions change.
- •
Robustness: Traditional model-based algorithms can encounter problems when faced with uncertain or unreliable power system parameters [7]. As a model-free alternative, deep learning methods alleviate these issues by not relying on power system parameters.
- •
Automation: Since deep learning algorithms can learn the responses of human experts in various situations given enough training data, they can be used to reduce the need for human intervention in certain power system tasks. For instance, in applications such as predictive maintenance [8], which are integral parts of asset management systems, deep learning can be applied within an automated real-time monitoring system.
In the continuation, we shortly introduce the basic deep learning terminology, describe the most common deep learning approaches and review their recent applications in the field of monitoring and optimization of electric power systems.
II Deep Learning Fundamentals
Deep learning is a field of machine learning that involves training neural networks on a large dataset [3], with a goal of generating accurate predictions on unseen data samples. Therefore, neural networks can be seen as trainable function approximators, composed of interconnected units called neurons, which process and transmit information. In a simple fully connected neural network, the information processing is organized in layers, where input information from the previous layer is linearly transformed using a function , where denotes the layer index. The linear transformation is defined using a matrix of trainable parameters , i.e., the weights of the connections between the neurons, shown in Fig. 1. Trainable parameters also include biases, which are free terms associated with each neuron, and are omitted in the figure. The information is then passed through a nontrainable nonlinear function to create the outputs of that layer. Inputs and outputs of the whole neural network are denoted as and in Fig. 1, where and denote the indices of input and output neurons.
Neural network training assumes adjusting the trainable parameters (i.e., weights and biases of the neurons) using the knowledge in the collected data, so that accurate predictions can be performed based on the new inputs. The training process is formulated as an optimization problem which searches through the trainable parameter space to minimize the distance function between the predicted output and the true output. The problem is usually solved using gradient-based optimization methods such as gradient descent, or some of its variants [9].
In practice, when using deep learning to solve a problem, it is common to train multiple instances with different neural network model structures. This structure is defined by hyperparameters, such as the number of layers and the number of neurons in each layer. By finding the optimal set of hyperparameters, the neural network structure that best fits the problem being solved can be identified. The hyperparameter search can be done manually or with the use of specialized optimization methods [10]. Commonly, the collected data is split into three sets: a training set, a validation set, and a test set. The training set is used in a neural network training process, the validation set is used to evaluate the performance of a single training instance, and the test set is used to evaluate the overall performance of the trained model.
Adjusting the deep learning model’s architecture to the specific structure of the input data can increase the training speed and performance and reduce the amount of needed training data [11]. This way of exploiting the regularity of the input data space by imposing the structure of the trainable function space is known by the term relational inductive bias [11]. Table I compares various deep learning models based on their input data structure, the type of neural network layers they use, and the corresponding relational inductive bias. One of the most successful examples of exploiting relational inductive biases are convolutional neural network (CNN) layers, producing algorithms that surpass human experts in many computer vision tasks. CNNs use the same set of trainable parameters (known as the convolutional kernel) to operate over parts of the input grid data independently, achieving locality and spatial translation invariance. Locality exploits the fact that neighbouring grid elements are more related than further ones, while spatial translation invariance is the ability to map various translations of the input data into the same output. Similarly, recurrent neural networks (RNNs) utilize trainable parameter sharing to process the segments of the sequential data, resulting in a time translation invariant algorithm. The main goal of graph neural networks (GNNs) from the inductive bias perspective is to achieve permutation invariance when applied over graph structured data, so that various matrix representations of the same graph map into the same output. Since ordinary, fully connected neural networks have been widely used for solving power systems problems, we focus on applications of more advanced deep learning architectures.
III Convolutional Neural Networks
Convolutional Neural Networks are a well studied class of deep learning algorithms, primarily designed for analysing spatial patterns in grid-structured data such as images [3]. They consist of multiple convolutional layers, each of which acts as a trainable convolutional filter that extracts local information from the image, transforms it into more abstract, grid-shaped representations, and feeds it into the succeeding layer. Applying multiple CNN layers enables CNN to extract useful features from an image, which can then be used for various tasks such as classification or regression.
Although power system data is not inherently arranged in the format of an image, CNNs have been effectively used to address power system problems, mostly involved with processing data sequences. To meet the requirements of CNNs, power system data is transformed and reshaped in various ways, some of which include:
- •
One approach for dealing with the time-varying nature of power systems is to utilize 1D CNNs on univariate time series data. For example, in study [12], 1D CNNs were used to predict power system inertia using only frequency measurements. The process involves stacking time series of changes in frequency measurements, along with their rates of change, into a one-dimensional array and then processing it using 1D CNNs.
- •
A more effective method is to group signals into a matrix, where each row represents a single univariate signal. By using a 2D CNN to process this matrix, we can perform multivariate time series analysis, which allows us to analyse patterns across multiple time series and how they interact with each other. This approach has been used in recent research, such as in the study [13], to detect faults in power systems through analysing series of voltage, current, and frequency measurements.
- •
Time series data can be subjected to time-frequency transformation, allowing for analysis of the frequency content of the signal while maintaining its temporal localization. These transformations can be visually represented in two dimensions, and therefore can be analysed using various image processing tools, including CNNs. For instance, in [14] a CNN was trained to classify faults in power systems by analysing 2D scalograms, which were generated by applying the continuous wavelet transform to time series of phasor measurements.
- •
Another approach is to use a CNN over the matrix of electrical quantities created for a single time instance, where each row contains the values of a specific electrical quantity for each power system element. This approach, which does not consider time series data, has been shown to be effective in certain applications. The study [15] solves the DC optimal power flow problem by using this approach and taking node-level active and reactive power injections as inputs, with labels obtained using the traditional DC optimal power flow approach.
It’s important to note that these approaches use only aggregated inputs from all the elements of the power system, without considering the connectivity between them.
IV Recurrent Neural Networks
Recurrent neural networks represent a significant development in deep learning algorithms, particularly in the processing of sequential data such as speech, text, and time series. [3]. Each of the recurrent layers acts as a memory cell that takes in information from previous steps in the sequence, processes it, and generates a hidden state representation that is passed on to the next step. The final hidden state of RNNs encapsulates the information of the entire input sequence and can be applied to tasks such as natural language processing, speech recognition, and time-series prediction. While 1D CNNs are limited to fixed length sequences, meaning that all time series in the training and test samples must have the same number of elements, RNNs are adaptable to varying sequence lengths, making them more versatile and useful for analysing sequential data.
The fundamental building blocks of RNNs are memory units, such as gated recurrent units (GRUs) and long short-term memory units (LSTMs) [16]. These architectures are created to tackle the challenge of longer-term dependencies in sequential data. Both GRUs and LSTMs include an internal memory, which allows them to selectively retain or discard information from previous steps in the sequence, thus enhancing their ability to handle inputs of varying lengths. LSTMs are more complex and powerful, capable of handling longer-term dependencies, while GRUs are computationally simpler and faster, yet may not be as effective in certain tasks.
In the field of power demand and generation forecasting, various time series prediction algorithms, including RNNs, have been utilized. One recent study, [17] uses LSTM RNNs to predict multistep-ahead solar generation based on recorded measurement history while also addressing missing records in the input time series. RNNs can also be used to predict the flexibility of large consumers’ power demand in response to dynamic market price changes, as demonstrated in [18]. This approach combines two LSTM RNNs, one for predicting market price and the other for predicting a consumer’s demand flexibility metric, with a focus on uncommon events such as price spikes. An interesting technical aspect of this method is that the two RNNs share some LSTM-based layers, resulting in more efficient and faster training, as well as improved prediction capabilities.
RNNs can also be applied to other data available in DMS and EMS, unrelated to power and energy. The work [19] proposes using an RNN to classify the voltage stability of a microgrid after a fault, using time series of measurement deviations, providing power system operators with valuable information, needed to take corrective actions. The employed RNN architecture is the bidirectional LSTM, which processes the time series data in both forward and backward directions, which allows the RNN to consider both past and future context in each step of the sequence when making predictions. In the study [20], the authors evaluate different deep learning models for detecting misconfigurations in power systems using time series of operational data. They compare GRU RNN, LSTM RNN, the transformer architecture [21], which has been successful in natural language processing tasks, and a hybrid RNN-enhanced transformer [22]. They find that the RNN-enhanced transformer is the most effective architecture, highlighting the potential of attention-based architectures for solving time series problems in power systems.
V Graph Neural Networks
Graph Neural Networks, particularly spatial GNNs that utilize message passing, are an increasingly popular deep learning technique that excels at handling graph structured data, which makes them particularly well-suited for addressing a wide range of power systems problems. Spatial GNNs process graph structured data by repeatedly applying a process called message passing between the connected nodes in the graph [23]. The goal of GNNs is to represent the information from each node and its connections in a higher-dimensional space, creating a vector representation of each node, also known as node embeddings. GNNs are made up of multiple layers, each representing one iteration of message passing. Each message passing iteration is performed by applying multiple trainable functions, implemented as neural networks, such as a message function, an aggregation function, and an update function. The message function calculates the messages being passed between two node embeddings, the aggregation function combines the incoming messages in a specific way to create an aggregated message, and the update function calculates the update to each node’s embedding. This process is repeated a predefined number of times, and the final node embeddings are passed through additional neural network layers to generate predictions.
GNNs have several advantages over the other deep learning methods when used in power systems. One of them is their permutation invariance property, which means that they produce the same output for different representations of the same graph by design. GNNs are able to handle dynamic changes in the topology of power systems and can effectively operate over graphs with varying numbers of nodes and edges. This makes them well suited for real-world power systems, which may have varying topologies. Additionally, GNNs are computationally and memory efficient, requiring fewer trainable parameters and less storage space than traditional deep learning methods applied to graph-structured data, which is beneficial in power system problems where near real-time performance is critical. Spatial GNNs have the ability to perform distributed inference with only local measurements, which makes it possible to use the 5G network communication infrastructure and edge computing to implement this effectively [24]. This enables real-time and low-latency decision-making in large networks as the computations are done at the network edge, near the data source, minimizing the amount of data sent over the network.
GNNs have recently been applied to a variety of regression or classification tasks in the field of power systems. The work [25] proposes using GNNs over the bus-branch model of power distribution systems, with phasor measurement data as inputs, to perform the fault location task by identifying the node in the graph where the fault occurred. The use of GNNs for assessing power system stability has been explored in [26], where the problem is formulated as a graph-level classification task to distinguish between rotor angle instability, voltage instability, and stability states, also based on power system topology and measurements. The paper [27] presents a hybrid neural network architecture which combines GNNs and RNNs to address the Short-Term Load Forecasting problem. The RNNs are used to process historical load data and provide inputs to GNNs, which are then used to extract the spatial information from users with similar consumption patterns, thus providing a more comprehensive approach to forecast the power consumption. In [28] the authors propose a GNN approach for predicting the power system dynamics represented as time series of power system states after a disturbance or failure occurs. The GNN is fed with real-time measurements from phasor measurement units that are distributed along the nodes of the graph. In [29] GNNs are applied over varying power system topologies to detect unseen false data injection attacks in smart grids.
In the previously mentioned studies, GNNs have been applied to the traditional bus-branch model of power systems, however, a recent trend in the field has been to apply GNNs over other topologies representing the connectivity in power system data. For example, GNNs have been used in combination with heterogeneous power system factor graphs to solve the state estimation problem, both linear [30] and nonlinear [31]. In these approaches, measurements are represented using factor nodes, while variable nodes are used to predict state variables and calculate training loss. These approaches are more flexible regarding the input measurement data compared to traditional deep learning-based state estimation methods because they provide the ability to easily integrate or exclude various types of measurements on power system buses and branches, through the addition or removal of the corresponding nodes in the factor graph. A different approach that does not use the GNN over the traditional bus-branch model is presented in [32]. The proposed method solves the power system event classification problem based on the collected data from phasor measurement units. The approach starts by using a GNN encoder to infer the relationships between the measurements, and then employs a GNN decoder on the learned interaction graph to classify the power system events.
VI Deep Reinforcement Learning
So far, we have reviewed deep learning methods that are inherently suited for predicting discrete or continuous variables based on a set of inputs. In contrast, deep reinforcement learning (DRL) methods have a direct goal of long-term optimization of a series of actions that are followed by immediate feedback[33]. Therefore, DRL methods are powerful tools for multi-objective sequential decision-making, suitable for application in various EMS and DMS functionalities that involve power system optimization [34]. In the DRL framework, the agent interacts with the stochastic environment in discrete time steps and the goal is to find the optimal policy that maximizes the long-term reward while receiving feedback about its immediate performance. The agent receives state variables from the environment, takes an action, receives an immediate reward signal and the state variables for the next time step, as shown in Fig. 2. The DRL training process involves many episodes that include agent-environment interaction, during which the agent learns by trial and error. Using the collected data from these episodes, the agent is able to predict the long term rewards in various situations using neural networks, and these predictions are then used to generate an optimal decision-making strategy.
There are many studies that apply DRL in the field of power system optimization and control. Some of the examples include distribution network reconfiguration for active power loss reduction [35], Volt-VAR control in electrical distribution systems [36], frequency control in low-inertia power systems [37], and so on. In these studies, an RL agent receives various electrical measurements as state information and takes a single multidimensional action per time step, which includes both discrete and continuous set points on controllable devices within a power system.
A recent trend in the power system research is transitioning from single agent to multi-agent deep reinforcement learning (MADRL), which is based on coordinating multiple agents operating together in a single environment using the mathematical apparatus developed in the field of game theory [38]. MADRL relies on centralized training and decentralized execution concept, where a centralized algorithm is responsible for training all the agents at once, allowing for coordination and cooperation among the agents. This centralized training approach results in faster real-life execution due to significantly reduced communication delays during decentralized execution, where each agent can act independently based on the knowledge acquired during the centralized training. Reducing these communication delays is particularly important in large transmission power systems where the individual agents may be significantly geographically separated.
For example, a decentralized Volt-VAR control algorithm for power distribution systems based on MADRL is proposed in [39]. In this algorithm, the power system is divided into multiple independent control areas, each of which is controlled by a corresponding DRL agent. These agents observe only the local measurements of electrical quantities within their corresponding area, and the action of each agent contains set points on all the reactive power resources in that area. Similarly, in [40], a MADRL algorithm is used to solve the secondary voltage control problem in isolated microgrids in a decentralized fashion by coordinating multiple agents, each of which corresponds to a distributed generator equipped with a voltage-controlled voltage source inverter. The action of each agent is a single secondary voltage control set point of the corresponding generator. The fundamental difference compared to [39] is that the agent in [40] uses not only the local measurements of electrical quantities for the state information, but also messages from the neighbouring agents, leading to improved performance. Work [41] proposes using a MADRL algorithm to perform the economic dispatch, which minimizes the overall cost of generation while satisfying the power demand. The agent models an individual power plant in a power system, with the action being the active power production set point. Another example of using MADRL for an economic problem in coupled power and transportation networks is given in [42]. A MADRL method is proposed to model the pricing game and determine the optimal charging pricing strategies of multiple electric vehicle charging stations, where each individually-owned EV charging station competes using price signals to maximize their respective payoffs. In all the aforementioned works, multiple agents are trained in a centralized manner to optimize the reward function defined globally based on the nature of the particular problem at hand.
VII Conclusions
Deep learning has demonstrated great potential to improve various aspects of both EMS and DMS, including power system monitoring tasks such as stability assessment, state estimation and fault detection, as well as for power system optimization tasks like Volt-Var optimization, distribution network reconfiguration, etc. Reviewed studies indicate that these methods exhibit high levels of accuracy and improved performance when compared to traditionally used techniques. One of the current trends in the field is the use of graph neural networks and multi-agent deep reinforcement learning. As the field continues to evolve, it is expected that more research and development will be conducted in these areas, with a focus on implementing these techniques in real-world power systems to demonstrate their practical potential.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] J. R. Aguero, E. Takayesu, D. Novosel, and R. Masiello, “Modernizing the grid: Challenges and opportunities for a sustainable future,” IEEE Power Energy Mag. , vol. 15, no. 3, pp. 74–83, 2017.
- 2[2] S. Rusitschka, K. Eger, and C. Gerdes, “Smart grid data cloud: A model for utilizing cloud computing in the smart grid domain,” in Proc. Smart Grid Comm . IEEE, 2010, pp. 483–488.
- 3[3] I. J. Goodfellow, Y. Bengio, and A. Courville, Deep Learning . Cambridge, MA, USA: MIT Press, 2016.
- 4[4] I. H. Sarker, “Deep learning: A comprehensive overview on techniques, taxonomy, applications and research directions,” SN comput. sci. , vol. 2, 2021.
- 5[5] K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Netw. , vol. 2, no. 5, p. 359–366, july 1989.
- 6[6] J. Quionero-Candela, M. Sugiyama, A. Schwaighofer, and N. D. Lawrence, Dataset Shift in Machine Learning . The MIT Press, 2009.
- 7[7] G. D’Antona, “Power system static-state estimation with uncertain network parameters as input data,” IEEE Trans. Instrum. Meas. , vol. 65, no. 11, pp. 2485–2494, 2016.
- 8[8] W. Zhang, D. Yang, and H. Wang, “Data-driven methods for predictive maintenance of industrial equipment: A survey,” IEEE Syst. J. , vol. 13, no. 3, pp. 2213–2227, 2019.
