Rethinking Multimodal Fusion for Time Series: Auxiliary Modalities Need Constrained Fusion
Seunghan Lee, Jun Seo, Jaehoon Lee, Sungdong Yoo, Minjae Kim, Tae Yoon Lim, Dongwan Kang, Hwanil Choi, SoonYoung Lee, Wonbin Ahn

TL;DR
This paper investigates the limitations of naive multimodal fusion in time series forecasting and introduces a controlled fusion method, CFA, that selectively integrates relevant auxiliary modality information, leading to improved performance.
Contribution
The paper proposes a novel Controlled Fusion Adapter (CFA) that enables effective, constrained integration of auxiliary modalities into time series models without altering the backbone architecture.
Findings
Constrained fusion methods outperform naive fusion strategies.
CFA effectively filters irrelevant information before fusion.
Over 20,000 experiments validate the approach's robustness.
Abstract
Recent advances in multimodal learning have motivated the integration of auxiliary modalities such as text or vision into time series (TS) forecasting. However, most existing methods provide limited gains, often improving performance only in specific datasets or relying on architecture-specific designs that limit generalization. In this paper, we show that multimodal models with naive fusion strategies (e.g., simple addition or concatenation) often underperform unimodal TS models, which we attribute to the uncontrolled integration of auxiliary modalities which may introduce irrelevant information. Motivated by this observation, we explore various constrained fusion methods designed to control such integration and find that they consistently outperform naive fusion methods. Furthermore, we propose Controlled Fusion Adapter (CFA), a simple plug-in method that enables controlled…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Topic Modeling · Multimodal Machine Learning Applications
