TimeSqueeze: Dynamic Patching for Efficient Time Series Forecasting
Sravan Kumar Ankireddy, Nikita Seleznev, Nam H. Nguyen, Yulun Wu, Senthil Kumar, Furong Huang, C. Bayan Bruss

TL;DR
TimeSqueeze introduces a dynamic patching method for time series transformers that adaptively segments data based on local complexity, significantly improving efficiency and forecasting accuracy.
Contribution
It proposes a novel content-aware segmentation mechanism that adaptively chooses patch boundaries, enhancing temporal fidelity and computational efficiency in transformer-based models.
Findings
Up to 20x faster convergence in large-scale pretraining
8x higher data efficiency compared to point-token baselines
Consistently outperforms fixed patching methods on forecasting benchmarks
Abstract
Transformer-based time series foundation models face a fundamental trade-off in choice of tokenization: point-wise embeddings preserve temporal fidelity but scale poorly with sequence length, whereas fixed-length patching improves efficiency by imposing uniform boundaries that may disrupt natural transitions and blur informative local dynamics. In order to address these limitations, we introduce TimeSqueeze, a dynamic patching mechanism that adaptively selects patch boundaries within each sequence based on local signal complexity. TimeSqueeze first applies a lightweight state-space encoder to extract full-resolution point-wise features, then performs content-aware segmentation by allocating short patches to information-dense regions and long patches to smooth or redundant segments. This variable-resolution compression preserves critical temporal structure while substantially reducing…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The introduction of TimeSqueeze, which dynamically combines point-level fine-grained encoding with adaptive patch-level compression, is a novel and well-motivated. The dynamic patching mechanism based on relative deviation effectively addresses the limitations of fixed-size patching and enables content-aware compression. 2. The paper demonstrates compelling efficiency gains (up to 20× faster training and 10× faster inference) while maintaining competitive forecasting performance with state-o
1. While the paper compares with fixed-patching methods and point-embedding models, it does not include comparisons with other adaptive or learned compression strategies from recent literature (e.g., learned chunking or entropy-based methods), leaving the relative advantage of the proposed patching criterion less fully contextualized. 2. Although the paper shows that longer pre-trained contexts improve performance, the analysis is limited to performance curves without deeper investigation into w
1. The methodology is well-motivated and clearly integrated into the forecasting framework. 2. The paper is clearly written and conceptually intuitive.
1. The experimental validation is limited, as the evaluations are conducted only on the Time-MoE architecture, which restricts the generality of the conclusions. 2. The overall architecture of TimeSqueeze largely builds upon the Time-MoE framework — equations (2–4) are directly inherited from the original Time-MoE paper — and the idea of dynamic patching has already been explored in several prior works. 3. The efficiency comparison with Time-MoE is not entirely fair, since Time-MoE is intentio
S1. This paper presents a hybrid forecasting architecture to incorporate dynamic, content-aware patching for adaptive compression in time series. S2. The experimental findings validate the computational efficiency of the proposed method.
1. Time series data often exhibit periodic and trend patterns. Relying solely on single-step differences between adjacent samples to determine boundaries may be insufficient for capturing periodic boundaries or trend changes. 2. The patching mechanism determines boundaries by comparing the absolute difference between adjacent samples with the local average power within a sliding window. Could the authors clarify how this criterion effectively distinguishes between information-rich and
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Traffic Prediction and Management Techniques · Machine Learning in Healthcare
