CC-Time: Cross-Model and Cross-Modality Time Series Forecasting
Peng Chen, Yihang Wang, Yang Shu, Yunyao Cheng, Kai Zhao, Zhongwen Rao, Lujia Pan, Bin Yang, Chenjuan Guo

TL;DR
CC-Time introduces a novel approach combining pre-trained language models with cross-modality and cross-model learning to significantly improve time series forecasting accuracy, especially in limited data scenarios.
Contribution
It proposes a new framework that integrates PLMs with time series models through cross-modality learning and a fusion block, enhancing forecasting performance.
Findings
Achieves state-of-the-art accuracy on nine datasets.
Effective in both full-data and few-shot learning.
Demonstrates the potential of PLMs in time series analysis.
Abstract
With the success of pre-trained language models (PLMs) in various application fields beyond natural language processing, language models have raised emerging attention in the field of time series forecasting (TSF) and have shown great prospects. However, current PLM-based TSF methods still fail to achieve satisfactory prediction accuracy matching the strong sequential modeling power of language models. To address this issue, we propose Cross-Model and Cross-Modality Learning with PLMs for time series forecasting (CC-Time). We explore the potential of PLMs for time series forecasting from two aspects: 1) what time series features could be modeled by PLMs, and 2) whether relying solely on PLMs is sufficient for building time series models. In the first aspect, CC-Time incorporates cross-modality learning to model temporal dependency and channel correlations in the language model from both…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The paper attempts a novel cross-modal fusion approach, and the idea of using PLMs to process channel correlations is explored. The design of the Cross-Model Fusion (CMF) Block is architecturally complex, utilizing multiple attention mechanisms to integrate information from the two heterogeneous branches. 2. Given the model's complexity, the authors have conducted numerous ablation studies. The experiments in Figure 3, Figure 9, and Appendix E attempt to demonstrate the necessity of the mode
1. **(Most Serious Issue) Dependency on Text Source and Generalizability**: The paper's primary methodological flaw is its dependency on an external LLM. While using an LLM to auto-generate text (Appendix A) solves the problem of missing text modalities in existing datasets, it builds a part of the model's performance on an uncontrolled, external black-box tool. * **Concerns about Text Quality**: We are concerned about the quality of the LLM-generated text. * **Concerns about Specific Dat
1. It's reasonable to incorporate the ability of LLMs into traditional TSF models in a multimodal manner. 2. It's interesting to use ChatGPT to describe each channel, making the essence and functionality of each channel more clear, which may help to better model channel-wise correlations and improve interpretability. 3. The CMF block seems to be novel and reasonable. 4. Extensive experiments have been conducted to validate the effectiveness of modules. Particularly, this paper works on full-data
1. A discussion between CC-Time and existing multimodal TSF methods (that also use both time series and textual data) is strongly recommended, which would make the contribution of this work more prominent. - What are the differences between the constructed textual input, in terms of both method and motivation. - Prompt length. - Why such textual input can make your multimodal data fusion unique (compared to existing methods like Time-LLM, TimeCMA). 2. The computational cost of each module (parti
1. Novel integration of PLMs and TS models: The paper presents a thoughtful framework that unites semantic (LLM-based) and numerical (Transformer-based) modeling. The design of the CLF block reflects careful consideration of cross-representational learning in time series forecasting. 2. Innovative cross-modality correlation modeling: Incorporating text-based variable descriptions and a correlation extractor allows the model to capture both global and local dependencies from semantic and numeric
1. Lack of explicit modality alignment between time-series embeddings and PLM semantic space: The paper directly feeds time-series embeddings into the pre-trained PLM without introducing any explicit alignment constraint between the numerical and linguistic modalities. This raises concerns about whether the frozen PLM can effectively interpret unaligned numeric encodings, especially since no contrastive or projection-based objective is applied to bridge the representational gap. As a result, the
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting
