Less Is More: Generating Time Series with LLaMA-Style Autoregression in Simple Factorized Latent Spaces
Siyuan Li, Yifan Sun, Lei Cheng, Lewen Wang, Yang Liu, Weiqing Liu, Jianlong Li, Jiang Bian, Shikai Fang

TL;DR
FAR-TS introduces a fast, flexible, and interpretable autoregressive framework for multivariate time series generation using a disentangled latent space and a Transformer model, outperforming diffusion-based methods in speed.
Contribution
The paper presents FAR-TS, a novel autoregressive approach combining disentangled factorization with a Transformer over a discrete latent space for efficient time series synthesis.
Findings
FAR-TS achieves significantly faster generation than diffusion-based models.
It preserves cross-channel correlations and interpretability.
The method enables flexible sequence length generation.
Abstract
Generative models for multivariate time series are essential for data augmentation, simulation, and privacy preservation, yet current state-of-the-art diffusion-based approaches are slow and limited to fixed-length windows. We propose FAR-TS, a simple yet effective framework that combines disentangled factorization with an autoregressive Transformer over a discrete, quantized latent space to generate time series. Each time series is decomposed into a data-adaptive basis that captures static cross-channel correlations and temporal coefficients that are vector-quantized into discrete tokens. A LLaMA-style autoregressive Transformer then models these token sequences, enabling fast and controllable generation of sequences with arbitrary length. Owing to its streamlined design, FAR-TS achieves orders-of-magnitude faster generation than Diffusion-TS while preserving cross-channel correlations…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
FAR-TS demonstrates efficiency and scalability over other existing models and supports arbitrary length inference. The principle of simplicity is well appreciated instead of following the trend of increasingly complex and over-engineered modules.
The biggest concern I have is with the datasets. Our community really needs to stop using ETT datasets which are nice and smooth and regular and there's no takeaway your can get from evaluating on such datasets. The task diversity is very limited, focusing on clean regular time series. Especially that your length generalization exp picks ETTh. The experiments are very weak. The number of baselines is very limited, especially that later experiments only compare two baselines.
1. The paper is easy to follow, the modules are clear. 2. Compared to the diffusion model in time series generation task, FAR-TS shows better generation quality and faster inference speed. 3. The disentangled factorization strategy in FAR-TS improves the Interpretability of time series data modeling. 4. The experimental results show the better generation performance, comparing to the popular baselines.
1. The FAR-TS lacks the novelty in time series generation task, because the Sdformer [1] has shown the effectiveness of VQ- strategy in time series generation task, the only difference is the disentangled Factorization in stage 1. However, the matrix factorization in Eqn.3 shows the last term E, which is not included in VQ-modeling may result in multiple approximations in VQ-strategy to low-quality modeling of stage 1. Can author add the sdformer for comparison ? 2. In Stage2, why the author ch
1. **Addresses Key Limitations:** The paper directly tackles significant drawbacks of state-of-the-art diffusion models for time series: slow iterative sampling and fixed window lengths. The proposed AR approach offers a compelling alternative. 2. **Novel Combination for Time Series:** While VQ and AR Transformers are established techniques, their specific application combined with an explicit *factorized* latent space (spatial basis U + temporal coefficients V) for *multivariate* time series
1. **Limited Novelty Within Components:** : The overall architecture is coherent but relies on standard elements (VQ, AR Transformer, low-rank factorization). The paper should clarify domain-specific innovations—for example, whether adaptations like RoPE or RMSNorm required modification for quantized temporal tokens, and whether the factorization encoder introduces unique advantages. 2. **Incomplete Discussion of Quality Trade-offs:** : While FAR-TS excels in speed and often in quality, some mar
1. Supports arbitrary-length generation, which is valuable for real applications 2. The method achieves orders-of-magnitude faster inference compared to Diffusion-TS 3. Comprehensive experimental evaluation across multiple datasets (ETTh, ETTm, fMRI, SSP) with diverse metrics 4. Extensive ablation studies provided (in the appendix section) 5. Generally well-written with informative visualizations
1. This paper has limited novelty which combines existing techniques without significant innovations. For example, Matrix factorization for time series is well stablished as well as the Vector quantization 2. The paper compares FAR-TS (4.23M-8.92M parameters) against Diffusion-TS (0.35M-1.22M parameters). This represents a 10-25× larger model, making all performance comparisons fundamentally unfair 3. The paper provides no analysis to verify that generated samples are authentic rather than memor
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Machine Learning in Healthcare · Generative Adversarial Networks and Image Synthesis
