Representative Days for Expansion Decisions in Power Systems
\'Alvaro Garc\'ia-Cerezo, Luis Baringo

TL;DR
This paper introduces a modified K-means clustering method to select representative days for power system expansion planning, effectively capturing extreme load and renewable production values to improve decision accuracy.
Contribution
A novel K-means based approach that emphasizes extreme data points for better modeling of uncertainties in power system expansion planning.
Findings
The proposed method better captures extreme load and renewable production values.
It improves the accuracy of expansion planning decisions.
The method outperforms traditional K-means in case study simulations.
Abstract
Short-term uncertainty should be properly modeled when the expansion planning problem in a power system is analyzed. Since the use of all available historical data may lead to intractability, clustering algorithms should be applied in order to reduce computer workload without renouncing accuracy representation of historical data. In this paper, we propose a modified version of the traditional K-means method that seeks to attain the representation of maximum and minimum values of input data, namely, the electric load and the renewable production in several locations of an electric energy system. The crucial role of depicting extreme values of these parameters lies in the fact that they can have a great impact on the expansion and operation decisions taken. The proposed method is based on the traditional K-means algorithm that represents the correlation between electric load and…
| Conventional generating unit | Node | [MW] | [$/MWh] | [$/MW] |
| 172 | 75 | - | ||
| 172 | 77 | - | ||
| 240 | 75 | - | ||
| 285 | 70 | - | ||
| 200 | 72 | - | ||
| 215 | 67 | - | ||
| 155 | 69 | - | ||
| 400 | 71 | - | ||
| 400 | 68 | - | ||
| 300 | 70 | - | ||
| 260 | 65 | - | ||
| 250 | 55 | 100,000 | ||
| 250 | 53 | 100,000 | ||
| 200 | 60 | 100,000 | ||
| 200 | 58 | 100,000 | ||
| 250 | 54 | 100,000 | ||
| 200 | 59 | 100,000 | ||
| 250 | 55 | 100,000 |
| Demand | Node | Zone | [MW] | [$/MWh] |
| West | 270.0 | 30,000 | ||
| East | 242.5 | 30,000 | ||
| West | 450.0 | 30,000 | ||
| West | 185.0 | 30,000 | ||
| East | 177.5 | 30,000 | ||
| East | 340.0 | 30,000 | ||
| East | 312.5 | 30,000 | ||
| East | 427.5 | 30,000 | ||
| West | 437.5 | 30,000 | ||
| East | 487.5 | 30,000 | ||
| East | 662.5 | 30,000 | ||
| West | 485.0 | 30,000 | ||
| West | 792.5 | 30,000 | ||
| West | 250.0 | 30,000 | ||
| West | 832.5 | 30,000 | ||
| West | 452.5 | 30,000 | ||
| East | 320.0 | 30,000 |
| Storage unit | Node | [MWh] | [MW] | [$] | |
| - | 100 | 50 | - | ||
| - | 100 | 50 | - | ||
| 2 | 250 | 125 | 14,000,000 | ||
| 3 | 250 | 125 | 14,000,000 | ||
| 2 | 200 | 100 | 11,200,000 | ||
| 1 | 300 | 150 | 16,800,000 | ||
| 1 | 400 | 200 | 22,400,000 |
| Transmission line | From bus | To bus | [pu] | [MW] | [$] |
| 0.014 | 150 | - | |||
| 0.211 | 150 | - | |||
| 0.085 | 150 | - | |||
| 0.127 | 150 | - | |||
| 0.192 | 150 | - | |||
| 0.119 | 150 | - | |||
| 0.084 | 150 | - | |||
| 0.104 | 150 | - | |||
| 0.088 | 150 | - | |||
| 0.061 | 150 | - | |||
| 0.061 | 150 | - | |||
| 0.161 | 150 | - | |||
| 0.165 | 150 | - | |||
| 0.084 | 150 | - | |||
| 0.084 | 150 | - | |||
| 0.084 | 150 | - | |||
| 0.084 | 150 | - | |||
| 0.048 | 150 | - | |||
| 0.042 | 150 | - | |||
| 0.048 | 150 | - | |||
| 0.087 | 150 | - | |||
| 0.075 | 150 | - | |||
| 0.059 | 150 | - | |||
| 0.017 | 150 | - | |||
| 0.049 | 150 | - | |||
| 0.049 | 150 | - | |||
| 0.052 | 150 | - | |||
| 0.026 | 150 | - | |||
| 0.023 | 150 | - | |||
| 0.014 | 150 | - | |||
| 0.105 | 150 | - | |||
| 0.026 | 150 | - | |||
| 0.026 | 150 | - | |||
| 0.040 | 150 | - | |||
| 0.040 | 150 | - | |||
| 0.220 | 150 | - | |||
| 0.220 | 150 | - | |||
| 0.068 | 150 | - | |||
| 0.120 | 175 | 106,670 | |||
| 0.140 | 175 | 113,330 | |||
| 0.165 | 175 | 111,000 | |||
| 0.048 | 500 | 228,940 | |||
| 0.048 | 500 | 228,940 | |||
| 0.075 | 500 | 416,250 |
| Wind-power unit | Node | Zone | [MW] | [$/MW] |
| South | 200 | - | ||
| South | 200 | - | ||
| South | 300 | 300,000 | ||
| South | 400 | 300,000 | ||
| North | 300 | 300,000 | ||
| North | 300 | 300,000 |
| CT [] | Computation | |||||
| time [min] | ||||||
| TKM | MKM | TKM | MKM | TKM | MKM | |
| 10 | 4.69 | 4.43 | 48.44 | 41.97 | 1 | 1 |
| 20 | 4.89 | 4.07 | 56.51 | 30.29 | 6 | 4 |
| 30 | 3.94 | 3.30 | 26.18 | 5.80 | 13 | 11 |
| 40 | 3.30 | 3.19 | 5.55 | 2.04 | 14 | 22 |
| 50 | 3.34 | 3.27 | 7.08 | 4.78 | 24 | 29 |
| 60 | 3.21 | 3.14 | 2.70 | 0.66 | 30 | 39 |
| 70 | 3.24 | 3.12 | 3.58 | 0.01 | 49 | 77 |
| 80 | 3.17 | 3.14 | 1.40 | 0.63 | 65 | 88 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Representative Days for Expansion Decisions in Power Systems
Álvaro García-Cerezo
Luis Baringo
Escuela Técnica Superior de Ingenieros Industriales de Ciudad Real, Department of Electrical Engineering, Universidad de Castilla–-La Mancha, Campus Universitario s/n, 13071 Ciudad Real, Spain
Abstract
Short-term uncertainty should be properly modeled when the expansion planning problem in a power system is analyzed. Since the use of all available historical data may lead to intractability, clustering algorithms should be applied in order to reduce computer workload without renouncing accuracy representation of historical data. In this paper, we propose a modified version of the traditional K-means method that seeks to attain the representation of maximum and minimum values of input data, namely, the electric load and the renewable production in several locations of an electric energy system. The crucial role of depicting extreme values of these parameters lies in the fact that they can have a great impact on the expansion and operation decisions taken. The proposed method is based on the traditional K-means algorithm that represents the correlation between electric load and wind-power production. Chronology of historical data, which influences the performance of some technologies, is characterized though representative days, each one composed of 24 operating conditions. A realistic case study based on the generation and transmission expansion planning of the IEEE 24-bus Reliability Test System is analyzed applying representative days and comparing the results obtained using the traditional K-means technique and the proposed method.
keywords:
Clustering , expansion planning , renewable production , storage
††journal:
Notation
The main notation used in this paper is stated below for quick reference, while other symbols are defined as needed throughout the text. A subscript / in the symbols below denotes their values in the th representative day/th hour.
Indices
Demands.
Conventional generating units.
Hours.
Transmission lines.
Nodes.
Representative days.
Storage facilities.
Wind-power units.
Sets
Receiving-end node of transmission line .
Sending-end node of transmission line .
Demands located at node .
Conventional generating units located at node .
Storage units located at node .
Wind-power units located at node .
Candidate conventional generating units.
Candidate transmission lines.
Candidate storage units.
Candidate wind-power units.
Parameters
Susceptance of transmission line [].
Operation cost of conventional generating unit [$/MWh].
Load-shedding cost of demand [$/MWh].
Energy initially stored in storage facility [MWh].
Maximum level of energy of storage facility [MWh].
Large enough positive constant.
Investment cost of candidate conventional generating unit [$/MW].
Annualized investment cost of candidate conventional generating unit [$/MW].
Investment budget for building candidate conventional generating units [$].
Investment cost of candidate transmission line [$].
Annualized investment cost of candidate transmission line [$].
Investment budget for building candidate transmission lines [$].
Investment cost of candidate storage facility [$].
Annualized investment cost of candidate storage facility [$].
Investment budget for building candidate storage facilities [$].
Investment cost of candidate wind-power unit [$/MW].
Annualized investment cost of candidate wind-power unit [$/MW].
Investment budget for building candidate wind-power units [$].
Maximum number of units that can be built of candidate storage facility .
Peak power consumption of demand [MW].
Capacity of conventional generating unit [MW].
Capacity of transmission line [MW].
Charging and discharging power capacity of storage facility [MW].
Capacity of wind-power unit [MW].
Capacity factor of wind-power unit [pu].
Demand factor of demand [pu].
Duration of time steps [h].
Charging efficiency of storage facility .
Discharging efficiency of storage facility .
Weight of representative day [days].
Optimization Variables
Energy stored in storage facility [MWh].
Number of units to be built of candidate storage facility .
Power produced by conventional generating unit [MW].
Capacity to be built of conventional generating unit [MW].
Power flow through transmission line [MW].
Load shed of demand [MW].
Charging power of storage facility [MW].
Discharging power of storage facility [MW].
Power produced by wind-power unit [MW].
Capacity to be built of wind-power unit [MW].
Binary variable that is equal to 1 if candidate transmission line is built, and 0 otherwise.
Voltage angle at node [rad].
1 Introduction
The Generation and Transmission Expansion Planning (G&TEP) problem is solve to determine the new facilities that should be built in a power system in order to ensure the supply of the electric load in the future, since the time frame of this problem can comprise several decades. It is motivated by the growth in peak loads, the penetration of renewable generating units and the aging of transmission facilities.
In most electricity markets, a central entity is in charge of taking expansion decisions of the transmission network, i.e., which transmission lines should be built. The aim of this system operator is to minimize investment and operation costs preventing load-shedding. In addition, expansion decisions of generating units are taken by private investors, whose purpose is to maximize their economic profits along with minimizing their financial risk. Nevertheless, optimal solution is not guaranteed accounting the G&TEP problem as two independent problems. This is the reason why the perspective of the system operator is generally considered in technical literature when dealing with a G&TEP problem. It means that the central entity attains the optimal solution minimizing operation and investment cost, of both transmission facilities and generating units. Once this is done, the system operator must provide indications about optimal expansion of generating units, with whom the government should design policy plans to promote the investment in certain technologies or locations.
Historical data are generally used to model the performance of power systems, since the more realistic the input data of the G&TEP problem are, the more accurate the solution of the problem will be in comparison with the future situation.
Regarding short-term uncertainty, electric load and renewable production are the historical data whose variability is more important. On the one hand, electric load is characterized by a daily evolution pattern. Since its variability depends on human habits, its progression can be accurately predicted using historical data. On the other hand, the generation of electric energy through renewable sources depends on meteorological conditions. For instance, the performance of wind turbines depends on wind speed as well as electric energy produced by solar panels and hydroelectric power stations relies on sunlight and rainfall, respectively.
Note that short-term uncertainty associated with renewable generating units increases the complexity of G&TEP problems due to weather forecast can be poorly predicted in advance as opposed to the daily evolution of electric load. Hence, the inclusion of storage units in electric energy systems is required in order to improve the penetration of renewable generating units. Thus, energy can be discharged from storage units when it is needed and stored when there is an excess energy. In addition, electric load and renewable generation are dependent magnitudes; for instance, low electric load generally coincides in time with high wind-power production. The optimization model used to solve G&TEP problems should properly represent this correlation between electric load and renewable production.
Solving a G&TEP problem commonly involves the use of hourly data especially when we consider technologies which depend on the chronology; for instance, storage units. Nevertheless, the optimization problem can be intractable because of using large amount of historical data as input data. Thus, it is required to reduce the amount of historical data used in order to achieve a near optimal solution in a reasonable time. For this purpose, several techniques have been implemented in technical literature, such as load-duration curves and the K-means method.
Load-duration curves technique depicts short-term uncertainty of electric load through different levels arranged into blocks, within each one of them an electric load cumulative distribution function is built. Subsequently, these functions are divided into several sectors, which are respectively associated with a different probability, and we calculate the average value inside each one of them obtaining different levels of electric load. This technique can be expanded considering electric load and renewable production; for instance, load- and wind-duration curves in the case of accounting wind-turbines in the electric energy system under study. In this case, the performance of arranging renewable production data is equal to the method previously explained for electric load data. Besides, both magnitudes share the same blocks, in where all combinations of different levels of electric load and renewable production can take place. These combinations, that can be used as input data of optimization problems, receive the name of system operating conditions. The accuracy of the solution obtained using them relies on the number of blocks and levels selected in this method, being greater when bigger these numbers are. However, a commitment should be reached between accuracy and computation workload. This criterion can be extrapolated to the rest of methods used in technical literature. Duration curves have been used in many references in the technical literature, e.g., considering net load duration curves [1, 2] or load- and wind-duration curves [3, 4]
The K-means technique applies algorithms of arranging data into groups, whose centroids are used with the purpose of representing the input data as well as reducing computer workload. The weight of each centroid is associated with the number of input data inside its group. This method has the advantage that, in contrast to load- and wind-duration curves technique, it can consider different correlations of electric load and renewable production in several locations of the electric energy system under study. The K-means method is used, for example, in [6, 7].
Duration curves and traditional K-means methods are compared in [8]. Their main issue of these two methods is that it is not possible to include units with inter temporal constraints such as storage units in the expansion problems. To deal with this issue, [9] proposes using a representative day of each season, while [10] and [11] consider a modified K-means method. The main drawback of these methods is that they may not represent accurately extreme values of input data. In case of using electric load and renewable production as input data, maximum and minimum values can have a great effect on the solution of the optimization problem.
Within this context, the contributions of this paper are threefold:
To propose a modified version of the traditional K-means method to achieve that system operating conditions obtained as output data of this technique properly represent maximum and minimum values of input data. 2. 2.
To use this new method to obtain representative days of electric load and wind-power production, each one composed of 24 operating conditions, in order to characterize the chronology of the historical data and thus allow the inclusion of technologies that depend on the chronology, such as storage units, in the formulation of expansion models. 3. 3.
To provide and analyze the results of a realistic case study with the purpose of checking if the proposed method reaches an improvement in the outcomes in comparison with the traditional K-means method.
The remaining of this paper is organized as follows. Section 2 explains the methodology of the traditional K-means method and the proposed modified version of this technique. Section 3 provides the formulation of the G&TEP problem. Section 4 displays the results of a case study, where a comparison among the outcomes obtained applying the clustering methods mentioned above is analysed. Finally, Section 5 concludes the paper with some relevant remarks.
2 Methodology
The K-means method is a clustering algorithm which aim is to arrange data into groups called clusters according to similarities. On the one hand, the inputs of this algorithm are historical data of two physical processes, namely, the electric load and the wind-power production in several locations of an electric energy system. On the other hand, the outputs of this technique are the cluster centroids along with the number of observations located at each cluster. Note that cluster centroids, defined by the values of the two physical processes involved, represent the system operating conditions, which can be used as input data in the resolution of optimization problems (e.g., a long-term planning problem).
The K-means technique is useful when dealing with a significant amount of data in optimization problems due to the reduction of computer workload. In order to ensure this, the users of this method are able to choose the number of operating conditions which is obtained. However, it must be taken into account that a low number of operating conditions can mean that the representation of the input data may not be very accurate. In contrast, a high number of clusters can lead to intractability.
2.1 Input data
It is important to normalize the input data before applying the algorithm in case of working with electric load and wind-power production data, because it is common that the order of magnitude of the first one is greater than in the case of the second one. If the input data are not normalized and the orders of magnitude of the two parameters analyzed are not similar, the results of the clustering method can be influenced by the weight of one of the parameters at the time of computing the quadratic distances between each original observation and each cluster centroid.
At this point, it is necessary to note that operating conditions cannot represent the chronology of the historical data. Due to the penetration of renewable generating units in the electric energy systems, the fact of not modeling the chronology of the input data can cause a distortion among the results obtained and the reality. Therefore, in this paper we use representative days, each one composed of 24 operating conditions, in order to characterize the chronology of the historical data. This means that technologies which depend on the chronology, such as storage units, can be included in the formulation of the expansion model.
We consider the historical data depicted in Figs. 1 and 2, acquired from [16], as input data of the algorithm. Fig. 1 represents the daily evolution of electric load, while Fig. 2 illustrates the daily evolution of wind-power production, both during a year. Note that, in this example, electric load units are MW, whilst wind-power production units are percent of installed. In Section 4, the units considered for both parameters are MW. A relevant aspect of Fig. 1 is that it displays a daily evolution pattern among different days of electric load data. By contrast, Fig. 2 shows that the daily evolution of wind-power production does not follow any pattern.
2.2 Traditional K-means algorithm
The algorithm of the K-means method that has been used in technical literature, known from now on as traditional K-means method (TKM), is based on the following steps [8]:
Step 1: Select the number of required clusters according to the needs of the problem.
- 2.
Step 2: Define the initial centroid of each cluster, e.g., randomly assigning a historical observation to each cluster.
- 3.
Step 3: Compute the quadratic distances between each original observation and each cluster centroid.
- 4.
Step 4: Allocate each historical observation to the closest cluster according to the distances calculated in Step 3.
- 5.
Step 5: Recalculate the cluster centroids using the historical observations allocated to each cluster.
Steps 3-5 are repeated iteratively until there are no changes in the cluster compositions between two consecutive iterations. Fig. 3 illustrates the TKM algorithm.
In spite of the fact that the traditional K-means method presents advantages in comparison with other techniques (e.g., it is able to represent temporal and spatial correlations between uncertain parameters considered while duration curves technique cannot do it), it is not exempt of drawbacks. The TKM sometimes does not adequately characterize the maximum and minimum values of the parameters analyzed. This may constitute a problem, especially regarding the peak values, when we consider electric load and wind-power production as input data of the algorithm because their extreme values can have a great impact on the solution of the optimization problem.
In the case of the generation and transmission expansion planning (G&TEP) problem, peak values of electric load can require the building of new generating units or new transmission lines to deliver the entire load of the electric energy system under study. Not to mention that if the load cannot be completely supplied even then, the total costs will increase due to the load-shedding costs. In addition, minimum values of electric load can also condition the solution of the optimization problem if the generating units have constraints linked to a minimum power produced greater than zero. Moreover, extreme values of wind-power production can also influence the expansion and operation decisions taken. Overall, maximum and minimum values of electric load and wind-power production can have an impact on the total costs, either by the investment costs associated with the expansion decisions made or by the operation costs related to the power produced by conventional generating units and load-shedding.
2.3 Modified K-means algorithm
To overcome these issues, we propose a new clustering method called modified K-means method (MKM), which tries to properly characterize the extreme values of the parameters considered, whose steps are presented below:
Step 1: Arrange the historical data into a number of clusters following the TKM.
- 2.
Step 2: Apply the same technique of clustering individually to the observations allocated to each cluster obtained in Step 1 arranging them into a number of clusters.
In other words, in Step 1 the MKM applies a first clustering to the historical data as it is customary in the technical literature, and then in Step 2 a second clustering is applied, but this time the input data are the observations of each cluster obtained in Step 1. Therefore, Step 2 is repeated times until it has been applied to all the clusters acquired in the previous step. The MKM algorithm is depicted in Fig. 4.
The number of operating conditions which are obtained as the output of this algorithm is equal to the product of and . For instance, a first clustering is applied organizing the input data into five clusters (). Then, the observations allocated to each cluster are considered as input data of a second clustering arranging them into two clusters (). Thus, the number of operating conditions obtained at the end of the algorithm is 10.
Note that the MKM can only be applied if the number of observations located at each cluster after Step 1 is greater than or equal to the parameter . In addition, the parameter must be less than or equal to the number of input data considered in Step 1. This last condition can be extrapolated to in the TKM.
Equation (1) defines the relation that must exist among the parameter , associated with the traditional K-means method, and the parameters and , linked to the modified K-means method, to make the results of both methods comparable.
[TABLE]
2.4 Output data
Since we use representative days in the case study described in Section 4, we consider that the parameters , and are associated with the number of representative days in their respective K-means methods, instead of the previous definitions that they have received in this paper.
The representative days of electric load and wind-power production obtained applying the traditional K-means method using are illustrated in Figs. 5 and 6, respectively. Furthermore, Figs. 7 and 8 display the representative days obtained applying the modified K-means method using and . It is remarkable to mention that Fig. 7 shows more representative days of electric load in the areas of maximum and minimum values in comparison with Fig. 5.
3 Formulation
The purpose of the G&TEP problem is to minimize the operation costs along with the costs incurred in building new facilities (generating units, storage units, and transmission lines). In this section, we provide the formulation of the G&TEP problem considering a deterministic approach using the following mixed-integer nonlinear programming (MINLP) model:
[TABLE]
where variables in set , ; , , , ; , ; , , , ; , , , ; , , , , , ; , , , ; , ; , ; , , , are the optimization variables of problem (2).
The objective function (2a) represents the aim of the G&TEP problem, which is minimizing the expansion (generation, storage, and transmission facilities) and operation (power produced by conventional generating units and load-shedding) costs. The terms associated with operation costs are multiplied by the weight of the corresponding representative day, , to make them comparable with expansion costs. Note that the sum of for all the representative days is equal to 365, i.e., the total number of days in a year.
Constraints (2b) limit the number of units to be built of each candidate storage facility. Constraints (2c) define , , as integer variables. Constraints (2d)-(2e) impose bounds on the capacity to be built of conventional and wind-power generating units, respectively. Constraints (2f) define as binary variables that indicate whether a candidate transmission line is built () or not (). Constraints (2g)-(2j) impose investment budgets for building candidate conventional generating units, transmission lines, storage, and wind-power units, respectively. Constraints (2k)-(2ae) are the operation constraints and comprise equations (2k) that impose the generation-demand balance at each node, where demand factors , , , , are linked to the output of the K-means method described in Section 2; constraints (2l)-(2m) that define the power flows through existing and candidate transmission lines, respectively, which are limited by constraints (2n)-(2o); equations (2p) that define the energy stored in storage units for all representative days and hours excluding the first hour of each day; equations (2q)-(2r) that define the energy stored in existing and candidate storage units, respectively, for the first hour of all representative days; constraints (2s)-(2t) which ensure that existing and candidate storage units, respectively, store a minimum amount of energy at the end of each representative day; constraints (2u)-(2v) that impose bounds on the energy stored in the existing and candidate storage units, respectively; constraints (2w)-(2x) that impose bounds on the power produced by existing and candidate conventional generating units, respectively; constraints (2y) that limit the load shed of demands; constraints (2z)-(2aa) that impose bounds on the charging power of existing and candidate storage units, respectively; constraints (2ab)-(2ac) that impose bounds on the discharging power of existing and candidate storage units, respectively; constraints (2ad)-(2ae) that impose bounds on the power produced by existing and candidate wind-power units, respectively, where wind-power capacity factors , , , , are associated with the output of the K-means method described in Section 2; and constraints (2af) which define the voltage angle at the reference node.
It is important to mention that the network constraints are modeled in the G&TEP problem using a DC model without losses for the sake of simplicity. In addition, fixed costs are not considered and the capacity to be installed of each generating unit, i.e., variables , , are considered continuous.
The G&TEP problem (2) is a mixed-integer nonlinear programming (MINLP) model. Nonlinear terms are in constraints (2m), i.e., products of binary and continuous variables. These nonlinear terms can be replaced by exact equivalent mixed-integer linear expressions as explained, e.g., in [13]. Thus, the G&TEP problem (2) can be finally formulated as a mixed-integer programming (MILP) model that can be solved using available branch-and-cut solvers, e.g., CPLEX [14].
4 Case Study
4.1 Data
We apply the expansion model described in Section 3 to the modified version of the IEEE Reliability Test System (RTS) [17] that is depicted in Fig. 9. This electric energy system comprises 11 conventional generating units, 17 demands, 24 nodes, two storage units, 38 transmission lines and two wind-power units. Table LABEL:t501 provides the conventional generating unit data; Table LABEL:t502 supplies the demand data; storage unit data is presented in Table LABEL:t503; the transmission line data can be consulted in Table LABEL:t504; and Table LABEL:t505 provides the wind-power unit data. It is necessary to mention that the annualized investment costs of candidate storage units, which are showed in Table LABEL:t503, are based on the data collected in [18]. We consider a set of values taking the average value of the costs provided in the two scenarios considered in [18], as it is displayed in equation (3).
[TABLE]
We consider that wind-power production and electric load can change their values depending on the zone of the electric energy system where wind-power units and demands are located. On the one hand, demands are allocated to the west and east zones of the system, as illustrated in Figs. 10 and 11. On the other hand, wind-power units are distributed between the north and south zones of the system, as depicted in Figs. 12 and 13. As in Section 2, the historical data of electric load and wind-power production have been acquired from [16]. It is remarkable to mention that the peak values of electric load in the west zone are greater than in the east zone. In addition, the maximum values of wind-power production are associated with the north zone. It is expected that the need to supply the high demands in the west zone will condition the investment decision making of the expansion problem.
It is supposed that we work with hourly data, thus the duration of time steps, , is equal to one hour. We consider that the charging and discharging efficiency of storage units is equal to 90 %. The energy initially stored in storage units is assumed to be zero for all the representative days. Node 1 is the reference node of the optimization problem. The parameter receives a value of 500,000. Due to the presence of transformers in the electric energy system considered as it can be noticed in Fig. 9, we select a base power of 100 MW. It is supposed that the values of the parameters and for each representative day and hour are the same for all the wind-power units and demands, respectively. Both parameters are obtained from the K-means methods.
Instead of considering a different investment budget for the building of each candidate generating/storage unit or transmission line, we consider a total investment budget, , which is distributed among the different types of facilities. Thus, it is supposed that constraints (2g)-(2j) of problem (2) are replaced by constraint (4) from now on. Therefore, we consider a total investment budget of $2,000 million. The annualized investment costs are 10 % of the total costs.
[TABLE]
4.2 Results
First of all, we solve the G&TEP problem using all the historical data to find the exact solution in order to compare it with the results obtained using representative days provided by both K-mean methods. However, it is necessary to make some changes in the formulation of problem (2) to properly characterize the continuity in time of the historical data. Thus, constraints (2q)-(2r) are replaced by constraints (5), which allude to the energy stored in each storage unit during the first hour of all the days except the first one relating it to the energy stored in the same storage unit during the last hour of the previous day. In addition, constraints (2s)-(2t) are replaced by constraints (6)-(7), which refer to the energy stored in each existing and candidate storage unit, respectively, during the first hour of the first day linking it to the energy initially stored in the same storage unit in the first day, .
[TABLE]
Having made these changes, the G&TEP problem is solved using the 366 days of historical data, due to the fact that the year considered is a leap year. The total annual cost obtained, , amounts to $3,124 million. The results show that the 0.14 % of the total demand is not supplied. The computation time required to obtain the exact solution is 55 h 28 min.
The steps that should be followed in order to make the results obtained using representative days comparable with the exact solution are presented below:
Step 1: Solve the G&TEP problem using representative days obtained applying the clustering methods.
- 2.
Step 2: Fix the values of the decision variables (, ; , ; , ; , ) obtained in Step 1 and solve the G&TEP problem using all the historical data.
- 3.
Step 3: Calculate the percent error, , associated with the total annual cost obtained in Step 2, , with regard to the total annual cost provided by the exact solution, , applying the equation (8).
[TABLE]
These steps are followed in the case study using a set of values of the parameter ranging from 10 to 80, being 366 the maximum value which could be selected. It means that we work with an equivalent amount of data ranging from 3 to 22 % of all the historical data considered.
Fig. 14 depicts the total annual cost obtained using different values of and clustering methods. We observe that the MKM presents values of the total annual cost closer to the exact solution than those obtained using the TKM for all the cases evaluated. Note that the differences among the results obtained using the clustering methods and the exact solution generally decrease at the same time that the value of increases. However, this is not always true because, for instance, this differences are greater considering than in the case of using . Due to the high total investment budget taken into account, most of the candidate facilities considered are built and more than the 99 % of the demand is supplied.
Fig. 15 illustrates the error of the total annual cost obtained using different values of and clustering methods. It is clear that the MKM provides results with less error than those obtained using the TKM for all the cases analyzed, especially in those where the parameter presents a low value.
Although it is fundamental to determine which clustering method provides the closest results to the exact solution, we should also analyze the computation times, obtained in Step 1 of the process described above, in the cases under study. It is relevant in Fig. 16 that the TKM generally provides shorter computation times, especially in those cases where the parameter presents a high value. However, it should be taken into account that the possible saturation of the server used to solve the G&TEP problem, caused by its concurrent use, may have influenced in the values of the computation times obtained. In addition, note that there is a rising trend of the computation times as well as it is increased the value of . The result of Figs. 14, 15 and 16 are collected in Table 6.
Taking into account the results commented before, we consider that the MKM provides better results than the TKM, especially regarding the error of the total annual cost. Although the computation times obtained using the MKM are generally greater than those acquired using the TKM, in several of the cases evaluated the error provided by the MKM in a given time is less than the error obtained using the TKM and the same amount of time. For instance, the MKM presents a 2.04 % of error using 40 representative days in 22 min, while the TKM spends 30 min to obtain a 2.70 % of error using 60 representative days. Due to this and the possible saturation problems in the server mentioned before, we consider that the results associated with the error are more relevant than those linked to the computation times.
4.3 Computation Times
The results of this case study are obtained using CPLEX [14] under GAMS [15] on an Intel Xeon E7-4820 computer with 4 processors at 2 GHz and 128 GB of RAM.
The computation time required to obtain the exact solution is 55 h 28 min. Regarding the resolution of the G&TEP problem using representative days, the corresponding computation times are collected in Table 6.
5 Conclusions
This paper proposes a new clustering method to adequately characterize the maximum and minimum values of the input data. In addition, we arrange the operating conditions obtained using the K-means method into representative days in order to depict the chronology of the historical data. This allows us to include storage units in the expansion model considered to solve the G&TEP problem.
The conclusion of this paper is that the results obtained in the case study using the modified K-means method and different numbers of representative days provide a total annual cost closer to the exact solution than in the case of using the traditional K-means method. In fact, although the computation times may have been influenced by the saturation of the server used, the results display that in some cases the MKM is able to solve the G&TEP problem in less time than the TKM using less representative days and achieving a minor error.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] M. Caramanis, R. Tabors, K. S. Nochur, and F. Schweppe, “The introduction of non-dispatchable technologies a decision variables in longterm generation expansion models,” IEEE Trans. Power App. Syst., vol. PAS-101, no. 8, pp. 2658–2667, 1982.
- 2[2] S. Wogrin, “Generation expansion planning in electricity markets with bilevel mathematical programming techniques,” Ph.D. dissertation, Universidad Pontificia Comillas de Madrid, Madrid, Spain, 2013.
- 3[3] L. Baringo and A. J. Conejo, “Transmission and wind power investment,” IEEE Trans. Power Syst. , vol. 27, no. 2, pp. 885-893, May 2012.
- 4[4] S. Montoya-Bueno, J. I. Muñoz, and J. Contreras, “A stochastic investment model for renewable generation in distribution systems,” IEEE Trans. Sustain. Energy , vol. 6, no. 4, pp. 1466-1474, Oct. 2015.
- 5[5] A. H. van der Weijde and B. F. Hobbs, “The economics of planning electricity transmission to accommodate renewables: Using two-stage optimisation to evaluate flexibility and the cost of disregarding uncertainty,” Energy Economics , vol. 34, no. 6, pp. 2089–2101, Nov. 2012.
- 6[6] L. Baringo and A. J. Conejo, “Strategic wind power investment,” IEEE Trans. Power Syst. , vol. 29, no. 3, pp. 1250-1260, May 2014.
- 7[7] R. Domínguez, A. J. Conejo, and M. Carrión, “Toward fully renewable electric energy systems,” IEEE Trans. Power Syst. , vol. 30, no. 1, pp. 316-326, Jan. 2015.
- 8[8] L. Baringo and A. J. Conejo, “Correlated wind-power production and electric load scenarios for investment decisions,” Appl. Energy , vol. 101, pp. 475–482, Jan. 2013.
