Knowledge Distillation for Efficient Transformer-Based Reinforcement Learning in Hardware-Constrained Energy Management Systems

Pascal Henrich; Jonas Sievers; Maximilian Beichter; Thomas Blank; Ralf Mikut; Veit Hagenmeyer

arXiv:2603.26249·cs.LG·March 30, 2026

Knowledge Distillation for Efficient Transformer-Based Reinforcement Learning in Hardware-Constrained Energy Management Systems

Pascal Henrich, Jonas Sievers, Maximilian Beichter, Thomas Blank, Ralf Mikut, Veit Hagenmeyer

PDF

TL;DR

This paper demonstrates that knowledge distillation can significantly compress transformer-based reinforcement learning models for residential energy management, enabling deployment on resource-constrained hardware without sacrificing performance.

Contribution

It introduces a method to distill large Decision Transformer models into smaller, efficient models suitable for embedded systems in energy management.

Findings

01

Distillation preserves control quality with up to 96% parameter reduction.

02

Inference memory is reduced by up to 90%.

03

Control performance improves slightly (up to 1%) after distillation.

Abstract

Transformer-based reinforcement learning has emerged as a strong candidate for sequential control in residential energy management. In particular, the Decision Transformer can learn effective battery dispatch policies from historical data, thereby increasing photovoltaic self-consumption and reducing electricity costs. However, transformer models are typically too computationally demanding for deployment on resource-constrained residential controllers, where memory and latency constraints are critical. This paper investigates knowledge distillation to transfer the decision-making behaviour of high-capacity Decision Transformer policies to compact models that are more suitable for embedded deployment. Using the Ausgrid dataset, we train teacher models in an offline sequence-based Decision Transformer framework on heterogeneous multi-building data. We then distil smaller student models by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.