VisionTS++: Cross-Modal Time Series Foundation Model with Continual Pre-trained Vision Backbones
Lefei Shen, Mouxiang Chen, Xu Liu, Han Fu, Xiaoxue Ren, Jianling Sun, Zhuo Li, Chenghao Liu

TL;DR
VisionTS++ leverages continual pre-training of vision models on time series data, introducing novel encoding and forecasting techniques to achieve state-of-the-art results across diverse datasets.
Contribution
It presents a new cross-modal time series foundation model that bridges modality, variate, and probabilistic gaps through innovative pre-training and encoding strategies.
Findings
Outperforms existing TSFMs by 6-44% in MSE
Achieves first place in GIFT-Eval benchmark
Effective in both in-distribution and out-of-distribution forecasting
Abstract
Recent studies have indicated that vision models pre-trained on images can serve as time series foundation models (TSFMs) by reformulating time series forecasting (TSF) as image reconstruction. However, effective cross-modal transfer from vision to time series remains challenging due to three discrepancies: (1) the data-modality gap between structured, bounded image data and unbounded, heterogeneous time series; (2) the multivariate-forecasting gap between fixed RGB-three-channel vision models and time series with arbitrary numbers of variates; and (3) the probabilistic-forecasting gap between the deterministic outputs of vision models and the requirement for uncertainty-aware probabilistic predictions. To bridge these gaps, we propose VisonTS++, a TSFM based on continual pre-training of a vision model on large-scale time series. Our approach introduces three key innovations: (1)…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
(1) first paper that systematically closes the data-range, multivariate and probabilistic gaps when turning an off-the-shelf vision backbone into a competitive TSFM. (2) no new attention layers, no patch re-design; only lightweight heads and input/output converters—easy to reproduce. (3) SOTA on 4 widely used benchmarks (31/62 first places on LTSF, best nMAE on Monash, top CRPS on PF, 1st on GIFT-Eval) with both base and large variants. (4) removing filtering (−7 %), colourisation (−12 %) or mul
(1) the core idea (TS ➔ image ➔ MAE) is identical; improvements come from three engineering accessories rather than a new modelling principle. (2) no analysis of why pixel-range filtering or random RGB boundaries should be optimal; no guarantee that vision inductive biases align with temporal dynamics. (3) only forecasting; classification, anomaly detection or irregular sampling not tested. (4) no study on (i) #quantile heads h, (ii) alternative change-point or range-based filters, (iii) image s
1. The paper proposes a new vision-model-based time series foundation model, which achieves good forecasting performance across multiple benchmarks. 2. The paper clearly identifies the key challenges of applying vision models to time series analysis, including the data–modality gap and the multivariate–forecasting gap. 3. The overall writing of the paper is clear and well-organized.
1. The paper presents an incremental improvement over VisionTS, with the proposed modules—vision-model-based filtering, colorized multivariate conversion, and multi-quantile forecasting—being relatively straightforward. The filtering module performs simple threshold-based filtering; the multivariate conversion resembles prior vision-based time series models such as ViTST (NeurIPS 2023); and the multi-quantile forecasting capability has already been incorporated in most recent TSFMs. 2. Regarding
1. The paper clearly identifies major limitations when applying vision models to time-series forecasting and systematically attempts to address them. 2. Incorporating probabilistic forecasting into the framework is novel and refreshing, extending beyond conventional deterministic designs.
1. The study only evaluates MAE-based VisionTS++, without validating other vision backbones(SimMIM, BootMAE, etc.). This limits the generality of the proposed framework. 2. The paper does not discuss the computational cost of continual pre-training, raising concerns about training efficiency and scalability.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Forecasting Techniques and Applications · Machine Learning in Healthcare
