Solar flare forecasting with foundational transformer models across image, video, and time-series modalities
S. Riggi, P. Romano, A. Pilzer, U. Becciani

TL;DR
This study compares transformer-based models across image, video, and time-series data for solar flare forecasting, finding that time-series models outperform others and emphasizing the potential of multimodal transformer architectures in space weather prediction.
Contribution
It introduces a systematic evaluation of recent foundational transformer models across multiple data modalities for solar flare forecasting, highlighting the superior performance of time-series models.
Findings
Time-series transformer (Moirai2) achieves higher TSS (~0.74) than image/video models.
Image and video models (SigLIP2, VideoMAE) reach TSS ~0.60-0.65.
Pretrained transformers show promise for integrated multimodal space weather forecasting.
Abstract
We present a comparative study of transformer-based architectures for solar flare forecasting using heterogeneous data modalities, including images, video sequences, and time-series observations. Our analysis evaluates three recent foundational models - SigLIP2 for image encoding, VideoMAE for spatio-temporal video representation, and Moirai2 for multivariate time-series forecasting - applied to publicly available datasets of solar magnetograms from the SDO/HMI mission and soft X-ray fluxes acquired by GOES satellites. All models are trained and validated under consistent data splits and evaluation criteria, with the goal of assessing the strengths and limitations of transformer backbones across spatial and temporal representations of solar activity. We investigate multiple loss formulations (weighted BCE, focal, and score-oriented) and training balance strategies to mitigate class…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
