TL;DR
SwiftTS is a novel framework that efficiently selects the best pre-trained time series models for new datasets using a learning-guided, lightweight dual-encoder approach with adaptive and transferable components, outperforming existing methods.
Contribution
The paper introduces SwiftTS, a new framework that predicts model performance for time series tasks without exhaustive fine-tuning, using a dual-encoder architecture and adaptive modules for better generalization.
Findings
Achieves state-of-the-art model selection accuracy on 14 datasets.
Reduces computational cost compared to exhaustive fine-tuning.
Demonstrates robustness across diverse datasets and horizons.
Abstract
Pre-trained models exhibit strong generalization to various downstream tasks. However, given the numerous models available in the model hub, identifying the most suitable one by individually fine-tuning is time-consuming. In this paper, we propose \textbf{SwiftTS}, a swift selection framework for time series pre-trained models. To avoid expensive forward propagation through all candidates, SwiftTS adopts a learning-guided approach that leverages historical dataset-model performance pairs across diverse horizons to predict model performance on unseen datasets. It employs a lightweight dual-encoder architecture that embeds time series and candidate models with rich characteristics, computing patchwise compatibility scores between data and model embeddings for efficient selection. To further enhance the generalization across datasets and horizons, we introduce a horizon-adaptive expert…
Peer Reviews
Decision·ICLR 2026 Poster
1. The proposed dual-encoder architecture is well-conceived and technically sound. One encoder incorporates temporal awareness through the use of patching and attention mechanisms, while the other enables knowledge injection by integrating architectural metadata, graph-based topological structures, and functional embeddings derived from model behavior. This design is particularly well-justified for highly diverse model repositories. 2. The manuscript presents extensive experimental results and
Although the paper shows runtime savings over fine-tuning, there's insufficient discussion of the the practical scaling beyond a fixed model zoo. For example, how does graph2vec embedding scale with hundreds or thousands of models with complex DAGs? Is there a resource bottleneck for functional embedding inference as the number of candidate models grows? The scalability arguments are more empirical than architectural; a more detailed analysis would be valuable.
- Addresses an important and under-explored problem in time series foundation model selection. - Well-designed method combining meta, topological, and functional model embeddings. - Extensive experiments with clear, consistent improvements.
- The design choices for the data and model encoders appear somewhat heuristic and lack sufficient justification. For example, why does the model encoder capture domain information while the data encoder does not? The paper could be strengthened by clarifying the design rationale of these encoders. - The meta-learner is trained on a relatively small pool (14 datasets × 8 models); a data-efficiency analysis or a discussion explaining why this scale suffices to learn reliable dataset–model corre
1. The idea of selecting more suitable pretrained models for different downstream tasks and datasets is both interesting and valuable, addressing a practically important yet underexplored challenge in time-series foundation modeling. 2. The dual-encoder architecture is well-motivated for heterogeneous time series model pools and directly addresses the issue of costly, inconsistent feature extraction in prior work. The use of a patch-wise attention mechanism reflects a careful design choice that
1. The paper emphasizes that SwiftTS avoids costly forward passes through all candidate models. Yet, the functional embedding module still requires each candidate model to be evaluated (albeit offline) on synthetic inputs such as Gaussian noise. This operation remains linearly proportional to the number of candidate models and does not scale well to continuously evolving model pools. The efficiency claim is therefore only partially valid and should be quantified more carefully. 2. Does the sampl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
