Safe Reinforcement Learning for Strategic Bidding of Virtual Power Plants in Day-Ahead Markets
Ognjen Stanojev, Lesia Mitridati, Riccardo de Nardis di Prata,, Gabriela Hug

TL;DR
This paper develops a safe reinforcement learning algorithm using DDPG for strategic bidding of Virtual Power Plants in day-ahead markets, incorporating safety constraints and physical feasibility to improve market competitiveness.
Contribution
It introduces a novel safety-enhanced DDPG method with a projection-based safety shield and reward penalties, tailored for VPP bidding with complex physical constraints.
Findings
The approach successfully learns safe, competitive bidding policies.
The safety shield effectively enforces physical and operational constraints.
Case study shows improved market performance and safety compliance.
Abstract
This paper presents a novel safe reinforcement learning algorithm for strategic bidding of Virtual Power Plants (VPPs) in day-ahead electricity markets. The proposed algorithm utilizes the Deep Deterministic Policy Gradient (DDPG) method to learn competitive bidding policies without requiring an accurate market model. Furthermore, to account for the complex internal physical constraints of VPPs we introduce two enhancements to the DDPG method. Firstly, a projection-based safety shield that restricts the agent's actions to the feasible space defined by the non-linear power flow equations and operating constraints of distributed energy resources is derived. Secondly, a penalty for the shield activation in the reward function that incentivizes the agent to learn a safer policy is introduced. A case study based on the IEEE 13-bus network demonstrates the effectiveness of the proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsBatch Normalization · Experience Replay · Weight Decay · Adam · Dense Connections · *Communicated@Fast*How Do I Communicate to Expedia? · Convolution · Deep Deterministic Policy Gradient
