A Novel Demand Response Model and Method for Peak Reduction in Smart Grids -- PowerTAC
Sanjay Chandlekar, Arthik Boroju, Shweta Jain, Sujit Gujar

TL;DR
This paper introduces a new demand response model for peak reduction in smart grids, utilizing incentive-based algorithms and real-world simulation to optimize load reduction strategies.
Contribution
It presents a novel probabilistic model for agent response to incentives and develops algorithms for optimal and online learning of discounts in smart grid demand response.
Findings
The RP function accurately models agent load reduction probability.
The MJS--ExpResponse algorithm maximizes expected reduction under budget constraints.
The online MJSUCB--ExpResponse algorithm achieves sublinear regret in learning RRs.
Abstract
One of the widely used peak reduction methods in smart grids is demand response, where one analyzes the shift in customers' (agents') usage patterns in response to the signal from the distribution company. Often, these signals are in the form of incentives offered to agents. This work studies the effect of incentives on the probabilities of accepting such offers in a real-world smart grid simulator, PowerTAC. We first show that there exists a function that depicts the probability of an agent reducing its load as a function of the discounts offered to them. We call it reduction probability (RP). RP function is further parametrized by the rate of reduction (RR), which can differ for each agent. We provide an optimal algorithm, MJS--ExpResponse, that outputs the discounts to each agent by maximizing the expected reduction under a budget constraint. When RRs are unknown, we propose a…
| Group | Customers | Type | %Usage in Tariff Market |
|---|---|---|---|
| BrooksideHomes & CentervilleHomes | Household | ||
| DowntownOffices & EastsideOffices | Small Offices | ||
| HextraChemical | Mid-level Offices | to | |
| MedicalCenter-1 | High-level Offices | to |
| Method | ||||||
|---|---|---|---|---|---|---|
| P1 | P2 | P1 | P2 | |||
| No Discount | ||||||
| Baseline | ||||||
| Average Over All Weeks of Training | ||||||
| MJSUCB–ExpResponse-W | ||||||
| MJSUCB–ExpResponse-UW | ||||||
| Average Over Last Weeks of Training | ||||||
| MJSUCB–ExpResponse-W | ||||||
| MJSUCB–ExpResponse-UW | ||||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSmart Grid Energy Management · Advanced Bandit Algorithms Research · Data Stream Mining Techniques
MethodsTest
A Novel Demand Response Model and Method for Peak Reduction in Smart Grids – PowerTAC
Sanjay Chandlekar
International Institute of Information
Technology (IIIT), Hyderabad, India
& Arthik Boroju
Indian Institute of Technology,
Ropar, India
& Shweta Jain
Indian Institute of Technology,
Ropar, India
& Sujit Gujar
International Institute of Information
Technology (IIIT), Hyderabad, India
Abstract
One of the widely used peak reduction methods in smart grids is demand response, where one analyzes the shift in customers’ (agents’) usage patterns in response to the signal from the distribution company. Often, these signals are in the form of incentives offered to agents. This work studies the effect of incentives on the probabilities of accepting such offers in a real-world smart grid simulator, PowerTAC. We first show that there exists a function that depicts the probability of an agent reducing its load as a function of the discounts offered to them. We call it reduction probability (RP). RP function is further parametrized by the rate of reduction (RR), which can differ for each agent. We provide an optimal algorithm, MJS–ExpResponse, that outputs the discounts to each agent by maximizing the expected reduction under a budget constraint. When RRs are unknown, we propose a Multi-Armed Bandit (MAB) based online algorithm, namely MJSUCB–ExpResponse, to learn RRs. Experimentally we show that it exhibits sublinear regret. Finally, we showcase the efficacy of the proposed algorithm in mitigating demand peaks in a real-world smart grid system using the PowerTAC simulator as a test bed.
Keywords Smart Grids, Demand Response (DR), PowerTAC, Learning Customer DR Model, Peak Reduction
1 Introduction
Load balancing is one of the most prevalent problems in energy grids, which occurs when there is a sudden surge of consumption (i.e., during peak hours) and the demand goes beyond the normal working range of supply. The sudden surge in demand leads to multiple issues: (i) peak demands put an added load on electricity * generating companies (GenCo)* to supply additional energy through fast ramping generators to fulfill the energy requirement of the customers (agents). (ii) The grid needs to support such dynamics and peak demand. The ramping up of the generators results in higher costs for distribution companies (DC). Typically, daily peak demands are approximately to times higher than the average demand U.S. EIA (2014). As per one estimation, a lowering of demand during peak hours of California electricity crisis in would have resulted in price reduction International Energy Agency (2003). Figure 1 conveys the same idea where a slight reduction in peak demand can significantly bring down the net electricity costs. Thus, it is paramount to perform load balancing in the grid efficiently.
A promising technology for load balancing is a smart grid. It is an electricity network that supplies energy to agents via two-way digital communication. It allows monitoring, analysis, control, and communication between participants to improve efficiency, transparency, and reliability Techopedia.com (2021). The smart grid technology is equipped with smart meters capable of handling the load in the smart grid by advising the agents to minimize energy usage during heavy load scenarios. The smart grid system can effectively balance the load by incentivizing agents to shift their energy usage to non-peak timeslots by signaling them the updated tariffs, commonly known as demand response (DR).
DR involves DC offering the agents voluntarily monetary incentives to optimize their electricity load. There are many approaches, such as auction-based mechanisms Zeng et al. (2015); Zhou et al. (2015) and dynamic pricing Goudarzi et al. (2021) to achieve DR. The major challenge with these approaches is that different agents may respond differently to the given incentives. Thus, to increase agent participation, it becomes crucial to learn their reaction toward these incentives. Learning agents’ behavior is challenging due to the uncertainty and randomness that creeps in due to exogenous factors like weather Shweta and Sujit (2020); Li et al. (2018). Works like Shweta and Sujit (2020); Li et al. (2018) consider a very simplistic model – when DC offers to an agent incentive more than what it values, the agent reduces every unit of electricity it consumes with a certain probability independent of the incentive. This probability is termed as reduction probability (RP) Jain et al. (2014); Shweta and Sujit (2020). RPs are learned using multi-armed bandit (MAB) solutions. There are three primary issues with these approaches. (i) Agents’ valuations need to be elicited Jain et al. (2014); Shweta and Sujit (2020), which adds additional communication complexity, (ii) agents reduce all with RP else nothing, and (iii) RPs do not change with incentives. In the real world, an increase in incentives should lead to an increase in RP. Our work considers the model where the RP is a function of incentives offered and not a constant for an agent, and reduction is not binary.
To model RP as a function of incentive, we need to carry out experiments with smart grids. However, any DR technique (or such experiments) proposed for a smart grid should also maintain the grid’s stability. The only way to validate that the proposed technique would not disrupt the grid operations while achieving DR is to test it on real-world smart grids, which is practically impossible. Nevertheless, Power Trading Agent Competition (PowerTAC) Ketter et al. (2013) provides an efficient and very close-to real-world smart grid simulator intending to facilitate smart grid research. We first perform experiments with PowerTAC to observe the behavior of different agents for the offered incentives. With rigorous experiments, we propose our model ExpResponse. We observe that the agents respond quickly to the incentives; however, more incentives may not substantially increase reduction guarantees. Different agents may have a different rate of reduction (RR) to incentives that determine how fast RP changes w.r.t. incentives. It also models the consumer valuation for one unit of electricity. A higher RR corresponds to the case where a consumer values the electricity less (for example, a home consumer). In contrast, a lower RR value indicates that the consumer values the electricity higher (for example, an office consumer).
We propose an optimization problem for the DC to maximize the expected peak reduction within the given budget. We then provide an optimal algorithm, namely MJS–ExpResponse, for the case when the reduction rate (RR)s of the agents are known. When RRs are unknown, we employ a standard MAB-algorithm, MJSUCB–ExpResponse, to learn RRs. Our experiments with synthetic data exhibit sub-linear regret (the difference between the expected reduction with known RRs and the actual reduction with MJSUCB–ExpResponse). With this success, we adopt it for PowerTAC set-up and experimentally show that it helps in reducing peak demands substantially and outperforms baselines such as distributing budget equally across all agent segments. In summary, the following are our contributions,
- •
We propose a novel model (ExpResponse) which mimics smart grid agents’ demand response (DR) behavior by analyzing agents’ behavior in a close-to real-world smart grid simulator, PowerTAC.
- •
We design an offline algorithm to optimally allocate the budget to agents to maximize the expected reduction.
- •
We design an online algorithm based on a linear search method to learn the RR values required to calculate optimal allocation in the offline algorithm. We further show that the proposed algorithm exhibits sub-linear regret experimentally.
- •
We evaluate the proposed algorithm on the PowerTAC platform – close to a real-world smart grid system. Experiments showcase the proposed algorithm’s efficacy in reducing the demand peaks in the PowerTAC environment (14.5% reduction in peak demands for a sufficient budget).
2 Related Work
Many demand response methods are available in the literature. Some of the popular ones include time-of-day tariff Ramchurn et al. (2011); Jain et al. (2013), direct load control Hsu and Su (1991), the price elasticity of demand approach (dynamic pricing) Chao (2012) approaches. These approaches are quite complex for the agents as the price keeps changing. It can lead to agent confusion due to uncertain supply, volatile prices, and lack of information. Due to the complexity involved in these methods, many recent works have focused on providing incentives to the agents, which make them shift their load from peak hours to non-peak hours Park et al. (2015); Jain et al. (2014).
In the literature, many techniques for providing incentives primarily focus on the setting where when given an offer (incentive), the consumer can either reduce or choose not to reduce the consumption. For example, DR mechanism in Jain et al. (2014); Shweta and Sujit (2020) considered a setting where each consumer was associated with two quantities: (i) valuation per unit of electricity, which represents how much a consumer values the unit of electricity, and (ii) acceptance rate, which denotes the probability of accepting the offer if a consumer is given the incentive more than his/her valuation. The authors then proposed a Multi-Armed Bandit mechanism that elicits the valuation of each consumer and learns the acceptance rate over a period of time. Similar approaches were also considered in Ma et al. (2016, 2017); Methenitis et al. (2019); Li et al. (2018). All the above models, in principle, assume that the acceptance rate is independent of the incentives given to the agents. In practice, this assumption does not hold. The acceptance rate ideally should increase with the increase in incentives. To the best of our knowledge, this paper considers the dependency of increased incentives on the acceptance rate for the first time, esp. in MAB-based learning settings. In principle, the paper considers the problem of an optimal allocation of the budget to different types of agents to maximize the overall peak reduction.
Two sets of works aim to maximize the peak reduction under a budget constraint. (i) With a mixed integer linear programming (MILP) approach Chen et al. (2020), and (ii) with an efficient algorithm by drawing similarities from the min-knapsack problem Singh et al. (2021). Other than that, there are a few tariff strategies for PowerTAC environment which mitigates the demand peaks by publishing tariffs to incentivize customers to shift their non-priority electricity usage to non-peak timeslots Chandlekar et al. (2022); Demijan et al. (2022); Ghosh et al. (2019). However, none of this technique talks about DR in detail.
3 Preliminaries and Mathematical Model
In a smart grid system, distributing companies (DC) distributes the electricity from GenCo to agents (household customers, office spaces, electric vehicles, etc.) in the tariff market. The customers are equipped with autonomous agents/bots to interact with the grid. Hence, we refer to customers as agents henceforth. Depending on their type, each agent exhibits a certain usage pattern which is a function of a tariff offered by the DC for most agents. We consider agents available to prepare for DR at any given timeslot.
A DR model can further incentivize agents, offering to agent , to shift their usages from peak to non-peak timeslot. However, agents may do so stochastically, based on external random events and the offered incentives. For each agent , this stochasticity can be modeled by associating the probability of reducing demand in the desired timeslot as . We call this probability as reduction probability (RP) . Note that the reduction in electricity consumption at peak slot for agents is not binary. For example, an agent with the usage of KWh and RP () of would reduce its usage by KWh in expectation. The general intuition is that higher incentives lead to a higher probability of accepting the offer, reducing the load in peak hours. Typically the DC has a limited budget to offer discounts. It aims to achieve the maximum possible peak reduction within the budget.
First, we need to model the agent’s RP function . We need a simulator that can efficiently model real-world agents’ usage patterns and the effects of DR on their usage patterns. PowerTAC Ketter et al. (2013) replicates the crucial elements of a smart grid, including state-of-the-art customer models. Below, we explain experimental details and observations from the PowerTAC experiments that helped us to come up with our novel model of the RP function.
3.1 Modelling the Reduction Probability (RP) Function Inspired from PowerTAC
PowerTAC Set-up:
The PowerTAC simulates the smart-grid environments in the form of games that run for a fixed duration. The standard game duration is around simulation days, which can be modified to play for an even longer duration. The simulation time is discretized into timeslots corresponding to every hour of the day. For each game, the PowerTAC environment randomly selects the weather of a real-world location, and the agents mold their usage pattern based on the selected weather in the game. During the game, DC aims to develop a subscriber base in the tariff market by offering competitive tariffs, which could be fixed price (FPT), tiered, time-of-use (ToU) or a combination of all. The DC also satisfies the energy requirement of their subscriber base by buying power in the wholesale market by participating in day-ahead auctions. There are different types of customers in PowerTAC. But, we focus on PowerTAC’s consumption agents – who consume electricity and aim to learn their RP function.
Experimental Set-up:
We perform the following nine sets of experiments to model the RP function. We play different games for simulation days for each experimental set-up and report the statistics averaged over these games. For each experiment, we make DC publish a tariff at the start of the game and keep that tariff active throughout the game. The initial tariff rates depend on the DC electricity purchase cost and may vary from game to game.
FPT-Set-up to identify peak slots: We make DC publish an FPT and record each consumption agent’s true usage pattern without any external signals from DC. Based on the true usage pattern of each agent, we identify the potential peak demand hours in a day. Figure 2 shows the usage pattern of a PowerTAC agent in response to the FPT; in this figure, the hours and have the peak usages during the day. The rate value of the FPT is derived by adding a profit margin in the DC’s electricity purchase cost. Next, we study the agents’ response to different tarrifs. To this, we consider the ToU model, where different prices are proposed at different times. These prices, however, are the same for all agents.
ToU-Set-up: In ToU tariffs, the rate charged for each unit of electricity consumed can vary depending on the time of the day. The ToU tariffs are designed so that the agents get discounts during non-peak hours and no/little discounts during peak hours. The average rate of the tariffs across all timeslots remains the same as the previous FPT-set-up. Essentially, all the ToU tariffs have the exact same area under the curve (AUC) as the FPT. We perform such an experiment for the remaining sets by offering discounts in each set; we give discount on non-peak timeslots compared to the price in peak timeslots. Here . Figure 2 explains how we move from an FPT (Fig. 2(a)) to a ToU tariff (Fig. 2(c)) by offering a certain discount and keeping the AUC the same for all the tariffs. Based on the discount level, the agents modify their usage patterns ((Fig. 2(b,d)), and we collect the usage data of each agent for each of the sets.
To analyze the effects of various discounts on agents’ usage patterns, we pick the top two peak hours in the day for each agent. Then, we calculate the difference between the electricity usage during FPT-Set-up and electricity usage during discounted ToU-Set-up for both peak slots. We do this for all eight sets numbered from to . We can view the discounted tariffs as a DR signal for the agents to shift their non-priority usages from peak to non-peak timeslots. Below, we show the observations of the DR experiments for a few selected agents.
Figure 3(a) and Figure 3(b) show the DR behavior of three PowerTAC agents BrooksideHomes, CentervilleHomes and EastsideOffices for their top two peak demand hours (re-scaled to visualize peak reduction as a probability function). The first two agents are household customers, whereas the last agent is an office customer. Analysing the plots gives a crucial insight into the agents’ behavior. The agents reduce their usage by a great extent for the initial values of discount , and but cannot reduce their usage further even when offered a much higher discount; secondly, different agents follow the different rate of reduction.
Based on the PowerTAC experiments, we conclude that the reduction probability function can be modeled by an exponential probability function and is given as:
[TABLE]
Here, is a discount (or incentive) given to agent , and is its reduction rate (RR). The proposed function depends upon the choice of ; the higher value of generates a steeply increasing curve (as shown with ), while the lower value makes the curve increase slowly with each discount (as shown with ) as shown in Figure 3(c). Let and be vector of offered incentives and RRs.
3.2 ExpResponse: The Optimization Problem
We assume that all the agents have the same electricity consumption in peak slots111Agents consuming different amounts can be trivially modeled by duplicating agents. The aim is to maximize the expected reduction under a budget constraint. This leads to the following optimization problem:
[TABLE]
Suppose the RR () values are known. In that case, we present an optimal algorithm MJS–ExpResponse to efficiently distribute the budget among the agents to maximize the expected sum of peak reduction (Section 4.1). When RRs are unknown, we provide MJSUCB–ExpResponse algorithm that estimates it (Section 4.2). The algorithm is motivated by multi-armed bandit literature Jain et al. (2014); Shweta and Sujit (2020) and uses the linear search over the possible range of values of RR.
4 Proposed Algorithms for ExpResponse
This section proposes a novel algorithm to solve ExpResponse. We discuss two settings: (i) perfect information that assumes the knowledge of RR, and (ii) imperfect information where RR values need to be learned over time.
4.1 Perfect Information Setting: Known RR
MJS–ExpResponse (Algorithm 1) distributes one unit of budget to an appropriate agent in each iteration until the entire budget is exhausted. To decide the appropriate agent for the current iteration, we calculate jump () values for all the agents. We define value for each agent as the change in RP for a unit change in discount. For example, if an agent has RP of for discount and RP of for discount , then the jump is the difference between these two probabilities. The algorithm finds an agent having the maximum such jump for the current unit of reduction (denoted by for agent ) and allocates the current unit discount to agent –Maximum Jump Selection (MJS). Finally, the algorithm returns the allocation, which is the optimal distribution of the initial fixed budget, as shown in the below theorem.
Theorem 1**.**
MJS–ExpResponse* is optimal.*
Proof.
For any discount vector , the objective function in Equation 2 can be written as a sum of jumps which denote the additional increase in reduction probability of consumer when offered units of discount compared to . i.e.
[TABLE]
Thus at the optimal solution, one unit is allocated to highest jumps and [math] to other jumps. We now need to prove that the earlier jump is higher than the latter, i.e., . The below lemma proves this for any agent . ∎
Lemma 1**.**
For each , we have with
Proof.
We have the following:
[TABLE]
From the last equation, we have . ∎
Note that one can use KKT conditions and derive a set of linear equations to determine an optimal distribution of . Our proposed algorithm is simple, determines an optimal solution in linear time, and has a time complexity of .
4.2 Imperfect Information Setting: Unknown RR
As RR of the agents are unknown in this setting, we estimate them based on the history of the agents, which consists of the agents’ response during the past timeslots. For each agent, we store its historical behavior by keeping track of the offered history and success history; we estimate through a routine .
MJSUCB–ExpResponse
We start by initializing and its UCB component and then estimating for each agent and for each using at each timeslot.
LinearSearch():
Estimating RRs with the offered history and success history, we calculate , for each offered to the agent . is then used to calculate candidate values of RR using Equation 1. Based on the candidate RR values, we determine that minimizes the squared error loss between the historical probabilities and the probabilities calculated based on the Equation 1 i.e., , for each of the discount value. The RR value that achieves the least squared error loss is returned as the optimal RR after the current timeslot. We follow the same method for each of the agents. Algorithm 2 discusses our proposed MJSUCB–ExpResponse method in more detail, which takes budget , , batch Size , and as inputs and returns s. Here, denote estimated and its UCB version, respectively.
4.3 Experimental Evaluation of MJSUCB–ExpResponse
To check whether the algorithm converges to the true RR, we conduct extensive analysis on a simplified version of a smart grid. Here, we discuss the experimental set-up to observe the regret of the proposed MJSUCB–ExpResponse. Regret is the difference between total reduction with known RRs and total reduction with unknown RRs. In both experiments, we repeat the experiment times, each instance having independently chosen . We report different statistics averaged over 25 iterations.
Exp1– Effect of batch size:
In this experiment, we keep the budget and constant, and vary batch sizes. This experiment shows the change in regret behavior as we change the batch sizes from low to high. For each of the batch sizes, we compare the regret for a different number of agents. Figure 4 compares average regret of MJSUCB–ExpResponse over iterations of varying true RR values of agents, with varying batch size, and keeping budget and constant. The figure shows three subplots with batch sizes of , , and , respectively. Each subplot compares the regret values for , , and agents, respectively, and shows sub-linear regret in the case of and agents for three different batch sizes. With an increased batch size to , even the case with agents converges to sub-linear regret within a few timeslots.
Exp2– Effect of budget and relation w.r.t. :
The second set of experiments is similar to the Exp1, except in this set, we vary the budget and number of agents while keeping the number of iterations and constant. The second set of experiments compares the peak reduction achieved by MJSUCB–ExpResponse and the optimal peak reduction when we know all the agents’ true RR. MAB literature refers to it as regret. It also shows how the success rates change when we increase the initial budget keeping the number of agents the same. Additionally, the experiment helps us observe the peak reduction with varying budgets across different numbers of agents. Figure 5 compares MJSUCB–ExpResponse peak reduction achieved when we know the true RR and peak reduction achieved by MJSUCB–ExpResponse across varying budget and the varying number of agents over iterations of varying true RR values, here we keep . As shown in the figure, the total peak reduction achieved by MJSUCB–ExpResponse is nearly the same as the reduction we get when we know all agents’ true RR and allocate the budget optimally. Thus, we analytically conclude that MJSUCB–ExpResponse achieves a sub-linear regret and its peak reduction success rates are approximately the same as the optimal peak reduction success rates. We next show the performance of MJSUCB–ExpResponse in PowerTAC.
5 MJSUCB–ExpResponse in PowerTAC
Modelling the customer groups
: In the algorithm, we assume all the agents are of the same type (meaning they use the same amount of electricity). However, in PowerTAC, agents are of varied types; for example, some belong to the household agents class, some belong to the office agents class, and some are village agents. Even for office agents, some offices use a high amount of electricity compared to others. Thus, we begin by grouping the agents based on their electricity usage, and create such groups, namely, , , and , as shown in Table 1. We consider groups due to the limitations of PowerTAC, where we cannot publish individual customer-specific tariffs. We can only publish tariffs for customer groups having similar usage ranges. However, our proposed model and algorithms do not rely on any assumption of the existence of such groups and treat each consumer as a separate user (in Sections 3 and 4). We leave out the remaining PowerTAC agents as they do not use a considerable amount of electricity in the tariff market.
Designing the tariffs for each group
: For each group , we publish ToU tariff such that agents from subscribe to tariff , and no other group of agents subscribe to that tariff. To achieve this, we combine ToU tariffs with tier tariffs as follows. In PowerTAC, tier tariffs specify rate values and upper bounds on electricity usage below which the specified rates are applicable. However, if the usage goes beyond that particular bound, the agent has to pay the rate values associated with the next higher bound. As we have segregated the agents based on their usage range, for any targeted group, we offer standard ToU rate values for its particular usage range and high rates for the remaining ranges of electricity usage. Thus, a group of agents naturally like the tariff designed for their group as the other tariffs are way costlier for their usage pattern. At any moment in the PowerTAC game, we keep all four tariffs active (one for each group); these tariffs keep getting updated based on the DR signals from DC.
Adapting MJSUCB–ExpResponse in PowerTAC
: While proposing our model, we assume that agents are identical and have the same usage capability. Thus, maximizing the sum of probability would also result in maximizing reduction. However, for general smart grid settings such as PowerTAC, we modify our model by giving weightage to agents based on their usage percentage (market cap). Higher weightage is given to agents that can reduce the larger amount of energy. We modify MJS–ExpResponse to introduce weights proportional to groups’ contribution to electricity usage for each group. to groups , respectively in our experiments. We still use (Line 5, Algorithm 2) to find the group that can fetch the highest increase in the probabilities as shown in Algorithm 1.
While allocating discounts to the groups, instead of allocating a unit of budget to each group, we weigh the unit with the group’s weight. For example, if gets selected for the discount, we assign a unit discount instead of . We call this way of allocation as WeightedMJS–ExpResponse. It may help to assign weights to the groups as assigning weights will allocate discounts proportional to their peak reduction capacity. For instance, reduction in would reduce more peak demand than reduction in .
Creating baseline:
To compare the performance, we consider the baseline of uniformly allocating the budget to all the groups. This leads to publishing group-specific tariffs with equal discounts. We record the peak reduction efficiency and reduction in capacity transactions from the baseline strategy. We then use the recorded information as a benchmark to evaluate MJSUCB–ExpResponse performance. Furthermore, we compare the efficacy against the strategy when we do not provide groups with any DR signals.
Evaluation metrics:
Finally, we evaluate MJSUCB–ExpResponse’s performance on two metrics, (i) MJSUCB–ExpResponse’s peak demand reduction capability, which indicates how much percentage of peak demand reduction MJSUCB–ExpResponse achieved compared to the benchmark strategies, and (ii) the reduction in capacity transaction penalties that suggest how effectively MJSUCB–ExpResponse can mitigate such penalties compare to the benchmark strategies.
Capacity transactions In PowerTAC, capacity transactions are the penalties incurred by the DC if the agents subscribed to their portfolio contribute to the peak demand scenarios. These huge penalties are a way to penalize the DC for letting the agents create demand peaks. Thus, as opposed to the previous section where we analytically show MJSUCB–ExpResponse exhibits a sub-linear regret, here in PowerTAC experiments, we aim to reduce capacity transaction penalties of DC using MJSUCB–ExpResponse.
5.1 Experiments and Discussion
Experimental set-up:
We perform multiple experiments with different initial budgets. We play games in each set with approximately simulation weeks (total weeks). For each set, we start the experiments by randomly initializing RR values for each group and calculate the budget allocation based on WeightedMJS–ExpResponse as well as MJS–ExpResponse (line 5 in MJSUCB–ExpResponse), called as MJSUCB–ExpResponse-W and MJSUCB–ExpResponse-UW, respectively.
As explained in Section 5, for each of the groups, we have four ToU tariffs. We keep the same tariffs active for simulation days and invoke the MJSUCB–ExpResponse at the end of the rd day. Based on the success probabilities, we update the and , and calculate the next set of and values for each group. Using the new , we calculate the next demand allocation and publish the new tariffs as explained earlier. While publishing new tariffs, we revoke the previous ones; thus, only tariffs are active at any time in the game. This days process constitutes a single learning iteration (). To calculate the success probability of each tariff, we played offline games without any discount to any group and noted down the top two peak timeslots. Let and denote per group usage during those peak timeslots. Then, we compute the success probability as and , with and denoting group 1 and 2 usage respectively. is then set as . We perform sets of experiments with and . We define a scalar value that gets multiplied by the discounts to generate fractional discounts.
Observations and Discussion:
Table 2 shows the cumulative peak usages under MJSUCB–ExpResponse and bench-marked (baseline) method for the top peaks of groups to . As shown in the table, for the overall simulation weeks of training in PowerTAC, MJSUCB–ExpResponse cumulative peak usage for peak1 is similar to the baseline method for both weighted and unweighted allocations, while slightly worse than the baseline for the peak2. The observation is consistent for the budget values and . However, if we focus on only the last weeks of training, MJSUCB–ExpResponse’s peak usage reduction capabilities are visible. Both weighted and unweighted allocations achieve cumulative peak reduction close to concerning No Discount peak usages for peak1 and , which is almost times better than the baseline while maintaining similar performance as the baseline for peak2. Similarly, MJSUCB–ExpResponse achieves significant improvement for too for peak1 by reducing the peaks to times better than baseline. Furthermore, as shown in Table 2, capacity transaction penalties in the last weeks are significantly lower than the No Discount and baseline. Due to DR signals, agents sometimes shift some of the demand from peak1 to peak2 or cannot reduce any demand from peak2. However, if the overall system’s performance is observed with the help of capacity transaction penalties in PowerTAC experiments, the penalties are significantly lower than the baseline, reinforcing the efficacy of MJSUCB–ExpResponse in the PowerTAC environment.
6 Conclusion
The paper proposed a novel DR model where the user’s behavior depends on how much incentives are given to the users. Using the experiments on the PowerTAC real-world smart grid simulator, we first showed that agents’ probability of accepting the offer increases exponentially with the incentives given. Further, each group of agents follows a different rate of reduction (RR). Under the known RR setting, we proposed MJS–ExpResponse which leads to an optimal allocation of a given budget to the agents, which maximizes the peak reduction. When RRs are unknown, we proposed MJSUCB–ExpResponse that achieves sublinear regret on the simulated data. We demonstrated that MJSUCB–ExpResponse is able to achieve a significant reduction in peak demands and capacity transactions just within 200 weeks of simulation on PowerTAC simulator.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1U.S. EIA [2014] U.S. EIA. Peak-to-average electricity demand ratio rising in new england and many other u.s. regions. https://www.eia.gov/todayinenergy/detail.php?id=15051 , 2014. [Online; accessed 19-January-2023].
- 2International Energy Agency [2003] International Energy Agency. The power to choose : Demand response in liberalised electricity markets, iea, paris, 2003.
- 3Techopedia.com [2021] Techopedia.com. Smart Grid. https://www.techopedia.com/definition/692/smart-grid , 2021. [Online; accessed 19-January-2023].
- 4Zeng et al. [2015] Ming Zeng, Supeng Leng, Sabita Maharjan, Stein Gjessing, and Jianhua He. An incentivized auction-based group-selling approach for demand response management in v 2g systems. IEEE Transactions on Industrial Informatics , 11(6):1554–1563, 2015. doi: 10.1109/TII.2015.2482948 . · doi ↗
- 5Zhou et al. [2015] Ruiting Zhou, Zongpeng Li, and Chuan Wu. An online procurement auction for power demand response in storage-assisted smart grids. In 2015 IEEE Conference on Computer Communications (INFOCOM) , pages 2641–2649, 2015. doi: 10.1109/INFOCOM.2015.7218655 . · doi ↗
- 6Goudarzi et al. [2021] Arman Goudarzi, Yanjun Li, Shah Fahad, and Ji Xiang. A game theory-based interactive demand response for handling dynamic prices in security-constrained electricity markets. Sustainable Cities and Society , 72, 2021. ISSN 2210-6707. doi: https://doi.org/10.1016/j.scs.2021.103073 . URL https://www.sciencedirect.com/science/article/pii/S 2210670721003577 . · doi ↗
- 7Shweta and Sujit [2020] Jain Shweta and Gujar Sujit. A multiarmed bandit based incentive mechanism for a subset selection of customers for demand response in smart grids. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 34, pages 2046–2053, 2020.
- 8Li et al. [2018] Yingying Li, Qinran Hu, and Na Li. Learning and selecting the right customers for reliability: A multi-armed bandit approach. In 2018 IEEE Conference on Decision and Control (CDC) , pages 4869–4874, 2018. doi: 10.1109/CDC.2018.8619481 . · doi ↗
