Transformer Training Strategies for Forecasting Multiple Load Time Series
Matthias Hertel, Maximilian Beichter, Benedikt Heidrich, Oliver, Neumann, Benjamin Sch\"afer, Ralf Mikut, Veit Hagenmeyer

TL;DR
This paper demonstrates that training Transformer models with a global transfer learning strategy significantly improves load forecasting accuracy across multiple clients in smart grids, outperforming other models and strategies.
Contribution
It introduces a global transfer learning approach for Transformer load forecasting models, showing its superiority over local and multivariate strategies in smart grid applications.
Findings
Global training strategy reduces forecasting errors by up to 21.8%.
Transformers outperform linear models, MLPs, and LSTMs in load forecasting.
Global strategy is effective across horizons from one day to one month.
Abstract
In the smart grid of the future, accurate load forecasts on the level of individual clients can help to balance supply and demand locally and to prevent grid outages. While the number of monitored clients will increase with the ongoing smart meter rollout, the amount of data per client will always be limited. We evaluate whether a Transformer load forecasting model benefits from a transfer learning strategy, where a global univariate model is trained on the load time series from multiple clients. In experiments with two datasets containing load time series from several hundred clients, we find that the global training strategy is superior to the multivariate and local training strategies used in related work. On average, the global training strategy results in 21.8% and 12.8% lower forecasting errors than the two other strategies, measured across forecasting horizons from one day to one…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEnergy Load and Power Forecasting · Image and Signal Denoising Methods · Neural Networks and Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Layer Normalization · Adam · Absolute Position Encodings · Softmax · Residual Connection
