ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data
Chengsen Wang, Qi Qi, Jingyu Wang, Haifeng Sun, Zirui, Zhuang, Jinming Wu, Lei Zhang, Jianxin Liao

TL;DR
ChatTime is a novel multimodal foundation model that unifies numerical and textual time series analysis, enabling zero-shot forecasting and bimodal processing, surpassing traditional unimodal methods.
Contribution
It introduces a framework modeling time series as a language, supporting multimodal input/output and zero-shot forecasting, which is a significant advancement over existing unimodal approaches.
Findings
ChatTime achieves superior performance across multiple tasks.
It demonstrates effective zero-shot forecasting capabilities.
The model supports bimodal input/output for time series and text.
Abstract
Human experts typically integrate numerical and textual multimodal information to analyze time series. However, most traditional deep learning predictors rely solely on unimodal numerical data, using a fixed-length window for training and prediction on a single dataset, and cannot adapt to different scenarios. The powered pre-trained large language model has introduced new opportunities for time series analysis. Yet, existing methods are either inefficient in training, incapable of handling textual information, or lack zero-shot forecasting capability. In this paper, we innovatively model time series as a foreign language and construct ChatTime, a unified framework for time series and text processing. As an out-of-the-box multimodal time series foundation model, ChatTime provides zero-shot forecasting capability and supports bimodal input/output for both time series and text. We design…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTime Series Analysis and Forecasting · Advanced Text Analysis Techniques
