Multimodal Forecasting for Commodity Prices Using Spectrogram-Based and Time Series Representations
Soyeon Park, Doohee Chung, Charmgil Hong

TL;DR
This paper introduces SEMF, a multimodal fusion approach that combines spectral and temporal representations using Transformers to improve commodity price forecasting accuracy.
Contribution
The paper proposes a novel spectrogram-enhanced multimodal fusion method that leverages wavelet spectrograms and Transformer encoders for better multivariate time series forecasting.
Findings
SEMF outperforms seven baselines across multiple tasks and horizons.
Spectrogram-based encoding captures multi-scale patterns effectively.
Multimodal fusion improves robustness and accuracy in commodity price prediction.
Abstract
Forecasting multivariate time series remains challenging due to complex cross-variable dependencies and the presence of heterogeneous external influences. This paper presents Spectrogram-Enhanced Multimodal Fusion (SEMF), which combines spectral and temporal representations for more accurate and robust forecasting. The target time series is transformed into Morlet wavelet spectrograms, from which a Vision Transformer encoder extracts localized, frequency-aware features. In parallel, exogenous variables, such as financial indicators and macroeconomic signals, are encoded via a Transformer to capture temporal dependencies and multivariate dynamics. A bidirectional cross-attention module integrates these modalities into a unified representation that preserves distinct signal characteristics while modeling cross-modal correlations. Applied to multiple commodity price forecasting tasks, SEMF…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
