Small Vocabularies, Big Gains: Pretraining and Tokenization in Time Series Models

Alexis Roger; Gwen Legate; Kashif Rasul; Yuriy Nevmyvaka; Irina Rish

arXiv:2511.11622·cs.LG·November 18, 2025

Small Vocabularies, Big Gains: Pretraining and Tokenization in Time Series Models

Alexis Roger, Gwen Legate, Kashif Rasul, Yuriy Nevmyvaka, Irina Rish

PDF

Open Access

TL;DR

This paper investigates how tokenizer design and pretraining influence time series forecasting models, showing that well-designed tokenizers combined with transfer learning significantly improve performance, especially with small vocabularies.

Contribution

It provides a systematic analysis of tokenizer scaling and quantization effects, demonstrating the importance of alignment between tokenization and pretraining in time series models.

Findings

01

Pretrained models benefit more from well-designed tokenizers.

02

Misaligned tokenization can negate pretraining advantages.

03

Small, efficient vocabularies with pretraining excel in multi-modal forecasting.

Abstract

Tokenization and transfer learning are two critical components in building state of the art time series foundation models for forecasting. In this work, we systematically study the effect of tokenizer design, specifically scaling and quantization strategies, on model performance, alongside the impact of pretraining versus random initialization. We show that tokenizer configuration primarily governs the representational capacity and stability of the model, while transfer learning influences optimization efficiency and alignment. Using a combination of empirical training experiments and theoretical analyses, we demonstrate that pretrained models consistently leverage well-designed tokenizers more effectively, particularly at smaller vocabulary sizes. Conversely, misaligned tokenization can diminish or even invert the benefits of pretraining. These findings highlight the importance of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsForecasting Techniques and Applications · Time Series Analysis and Forecasting · Stock Market Forecasting Methods