Learning Reachability of Energy Storage Arbitrage
Tom\'as Tapia, Agustin Castellano, Enrique Mallada, Yury Dvorkin

TL;DR
This paper develops an online learning framework for energy storage arbitrage that improves reliability and profit by aligning storage control with system needs through a novel reward and penalty design.
Contribution
It introduces a stopping-time reward and SoC range penalty within an end-to-end learning framework to enhance storage reachability and profitability under uncertainty.
Findings
Improved reachability of target SoC ranges.
Enhanced profit under volatile market conditions.
Reduced profit standard deviation, indicating more stable performance.
Abstract
Power systems face increasing weather-driven variability and, therefore, increasingly rely on flexible but energy-limited storage resources. Energy storage can buffer this variability, but its value depends on intertemporal decisions under uncertain prices. Without accounting for the future reliability value of stored energy, batteries may act myopically, discharging too early or failing to preserve reserves during critical hours. This paper introduces a stopping-time reward that, together with a state-of-charge (SoC) range target penalty, aligns arbitrage incentives with system reliability by rewarding storage that maintains sufficient SoC before critical hours. We formulate the problem as an online optimization with a chance-constrained terminal SoC and embed it in an end-to-end (E2E) learning framework, jointly training the price predictor and control policy. The proposed design…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
