# Extreme-Aware Time-Series Forecasting via Weak-Label-Guided Mixture of Experts

**Authors:** Jialou Wang, Jacob Sanderson, Wai Lok Woo

PMC · DOI: 10.3390/s26051571 · Sensors (Basel, Switzerland) · 2026-03-02

## TL;DR

This paper introduces a new forecasting model that improves predictions during rare extreme events by using a mixture of specialists trained with weak labels.

## Contribution

WL-MoE introduces a two-stage training approach with weak-label guidance to stabilize expert specialization in extreme time-series forecasting.

## Key findings

- WL-MoE reduces average MSE by 7.9% and extreme-case MSE by 23.58% across seven datasets.
- In UK flood forecasting, WL-MoE reduces all-water MSE by 31.6% and high-water MSE by 35.0%.

## Abstract

Deep time-series forecasting models can achieve strong average accuracy under normal conditions, yet they often struggle with rare, high-impact extremes, where severe class imbalance biases learning toward majority dynamics. Although infrequent, these extremes frequently correspond to critical events such as natural disasters or power outages. We address this challenge with a weak-label-guided mixture of experts (WL-MoE) that routes each input window to lightweight specialists designed to capture distinct temporal regimes. To prevent routing collapse during early optimisation, WL-MoE follows a two-stage training curriculum. In Stage I, cluster-derived weak labels encourage diverse expert utilisation and promote specialisation under imbalance. In Stage II, guidance is removed and training proceeds solely with the forecasting objective, ensuring that inferences remain fully data-driven. The expert-based structure also supports interpretable routing via expert-usage profiling, enabling regime-level auditing of model behaviour in high-stakes settings. Across seven benchmark datasets, WL-MoE reduces the average MSE by approximately 7.9% and the extreme-case MSE by approximately 23.58% relative to the best baseline. In a UK flood forecasting study, it reduces the all-water MSE by 31.6% and the high-water MSE by approximately 35.0%. These results indicate that weak-label guidance can stabilise specialisation and improve reliability under rare extremes while keeping model behaviour auditable for real-world deployment.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12987348/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12987348/full.md

## References

43 references — full list in the complete paper: https://tomesphere.com/paper/PMC12987348/full.md

---
Source: https://tomesphere.com/paper/PMC12987348