UniCA: Unified Covariate Adaptation for Time Series Foundation Model
Lu Han, Yu Liu, Lan Li, Qiwen Deng, Jian Jiang, Yinbo Sun, Zhe Yu, Binfeng Wang, Xingyu Lu, Lintao Ma, Han-Jia Ye, De-Chuan Zhan

TL;DR
UniCA introduces a unified framework that enables Time Series Foundation Models to effectively incorporate diverse covariates, including categorical and multimodal data, enhancing their forecasting capabilities across various real-world scenarios.
Contribution
UniCA presents a covariate homogenization and attention-based fusion framework, allowing TSFMs to adapt to heterogeneous covariates without sacrificing generalization.
Findings
Outperforms existing methods on multiple benchmarks.
Effectively incorporates heterogeneous covariates.
Enhances forecasting accuracy in multimodal scenarios.
Abstract
Time Series Foundation Models (TSFMs) have achieved remarkable success through large-scale pretraining. However, their design primarily targets real-valued series, limiting their ability to handle general forecasting tasks involving diverse and often heterogeneous covariates -- such as categorical variables and multimodal data (e.g., images, text) -- which are typically task-specific and difficult to leverage during pretraining. To address this gap, we propose Unified Covariate Adaptation (UniCA), a framework to bridge TSFMs with general covariate-aware forecasting. UniCA first performs covariate homogenization to transform heterogeneous covariates into high-level homogeneous series representations and then fuses them via a unified attention-based fusion mechanism. UniCA is compatible and universal for adaptation with both homogeneous and heterogeneous covariates, incorporating extra…
Peer Reviews
Decision·ICLR 2026 Poster
1. Problem formulation and motivation are explained clearly. 2. Authors have conducted a thorough evaluation spanning various prior work, datasets and covariates. 3. Proposed method has minimal overhead during inference time. 4. Training one adapter per dataset can work on any forecast horizon. (No requirement of unique adapter for each forecast horizon)
1. Although TSFM parameters are frozen when training the adapter, training cost is still incurred to perform adaptation. This takes away the ability of TSFMs to forecast zero-shot. 2. The adapter proposed in ChronosX is also based on linear layers. The contribution of this work therefore appears mostly incremental. The authors add additional parameters and attention modules before and after the backbone transformer FM. It is unclear whether the performance gain arises simply from these addition
- The specially designed plug-in fusion modules preserve pretrained generalization while efficiently leveraging covariate information. - The two-stage design (homogenization + fusion) is intuitive and broadly compatible with multiple TSFM architectures - Strong empirical results demonstrating superior performance on 12 unimodal and multimodal datasets with minimal added cost. - Comprehensive analysis and ablations confirming the framework’s universality, interpretability, and computational effic
1. **Unclear embedding alignment across heterogeneous modalities.** The paper claims that the *Covariate Homogenization* module transforms heterogeneous covariates (e.g., categorical, image, and text) into a unified latent space, yet no explicit *alignment loss* or *architectural constraint* ensures such consistency. Given the complexity of learned representations, the claim that all embeddings could be easily projected into the same latent space by only using one linear layer is not very co
1. The motivation is clear and sufficient. Adapting pre-trained channel-independent time series foundation model to covariate-aware pratical tasks makes sense. 2. This is the first work to formalize the problem of adapting Time Series Foundation Models (TSFMs) to general covariate-aware forecasting scenarios. 3. Technically sound and experiments are comprehensive.
1. When dealing with multi-modal covariate, the tokenization methods for different modality is critical, which need more discussion and explanation. 2. According to Fig.3 (a), the improvements for Chron os-Bolt and TimesFM seem to be marginal. 3. For modeling time series with discrete variables, there are existing works should be discussed: [1]. General Mixed Time Series Analysis via Latent Continuity Recovery and Alignment. NeurIPS 2024.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Forecasting Techniques and Applications · Traffic Prediction and Management Techniques
