# A Reinforcement Learning-Based Framework for Tariff-Aware Load Shifting in Energy-Intensive Manufacturing

**Authors:** Jersson X. Leon-Medina, Mario Eduardo González Niño, Claudia Patricia Siachoque Celys, Bernardo Umbarila Suarez, Francesc Pozo

PMC · DOI: 10.3390/s26061858 · Sensors (Basel, Switzerland) · 2026-03-15

## TL;DR

This paper introduces a reinforcement learning framework to optimize energy use in manufacturing by shifting loads to avoid high electricity tariffs, achieving cost reductions while managing operational constraints.

## Contribution

The novel contribution is a tariff-aware load-shifting framework using PPO reinforcement learning with real-time industrial sensing data.

## Key findings

- The PPO framework achieved a median 10% reduction in total energy costs over 30 days.
- Deviations from energy-balance and production constraints highlight the need for tighter constraint handling.
- PPO outperformed DP, DQN, and GREEDY in cost and operational performance.

## Abstract

Optimizing energy-intensive manufacturing under time-varying electricity tariffs requires scheduling strategies that reduce cost without compromising operational feasibility. This study is grounded in readily available industrial sensing: we exclusively use time-series measurements of aggregated active power and energy at the main distribution board of a quicklime production plant. We propose a tariff-aware load-shifting framework in which a Proximal Policy Optimization (PPO) reinforcement learning agent is trained in a custom Gymnasium environment to apply discrete consumption scaling actions constrained to 80–125% of a baseline profile during the operating shift (08:00–16:00), explicitly accounting for demand-charge exposure in the TOU peak window (13:00–15:00). The reward design combines instantaneous electricity cost with cumulative energy-tracking penalties and terms associated with operational constraints. Multi-day validation over N=30 working days shows consistent economic benefits, with a median total cost reduction on the order of 10% (narrow IQR) driven by reduced peak-window energy and demand peaks. However, the script-based binary compliance indicators (viol_energy, viol_prod_min) reveal deviations from the energy-balance criterion and occasional minimum-production shortfalls under the tolerances used, highlighting the cost–production trade-off and the need for stricter constraint handling for industrial deployment. In addition, we benchmark against dynamic programming (DP), an alternative RL policy (DQN), and a greedy heuristic (GREEDY), comparing cost; operational performance; and, when applicable, computational efficiency, which positions PPO as a competitive alternative among the considered methods. Overall, this work demonstrates how learning-based decision making can be coupled with real-world industrial sensing infrastructures, providing a data-driven tariff-aware scheduling layer for industrial energy management under practical constraints.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13030402/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13030402/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/PMC13030402/full.md

---
Source: https://tomesphere.com/paper/PMC13030402