Counter-Dyna: Data-Efficient RL-Based HVAC Control using Counterfactual Building Models

Jan Marco Ruiz de Vargas; Fabian Raisch; Zoltan Nagy; Pierre Pinson; Christoph Goebel

arXiv:2605.04555·cs.LG·May 7, 2026

Counter-Dyna: Data-Efficient RL-Based HVAC Control using Counterfactual Building Models

Jan Marco Ruiz de Vargas, Fabian Raisch, Zoltan Nagy, Pierre Pinson, Christoph Goebel

PDF

TL;DR

Counter-Dyna significantly improves data efficiency in reinforcement learning-based HVAC control by using counterfactual surrogate models, reducing training time from 6-12 months to just 5 weeks, and demonstrating cost savings in simulations.

Contribution

It introduces counterfactual surrogate models to enhance data efficiency in model-based RL for HVAC control, enabling faster training with less interaction data.

Findings

01

Requires only 5 weeks of data compared to 6-12 months for previous methods.

02

Achieves cost savings of 5.3% to 17.0% in simulations.

03

Demonstrates practical viability of RL in HVAC control.

Abstract

Model-based reinforcement learning (MBRL) offers a promising approach for data-efficient energy management in buildings, combining the strengths of predictive modeling and reinforcement learning. While previous MBRL methods applied to HVAC control have reduced training data requirements, they still require several months of interaction with the building to learn a satisfactory control policy. A key reason is that existing surrogate models attempt to predict the entire state-space, including weather and electricity prices that are unaffected by control actions, or completely ignore these variables. Addressing these issues, we propose Counter-Dyna, a method that enhances the data-efficiency of Dyna, an MBRL method. We create data-efficient counterfactual surrogate models (CSM) by leveraging invariances in the state-space. Using a CSM in Dyna speeds up RL training measured in environment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.