TL;DR
This paper introduces CTBench, a comprehensive benchmark for evaluating cryptocurrency time series generation models, addressing crypto-specific challenges and providing multi-dimensional assessments including forecasting, trading, and risk.
Contribution
We present CTBench, the first dedicated cryptocurrency TSG benchmark with a dual-task evaluation framework and diverse metrics, filling a critical gap in crypto-specific synthetic data generation.
Findings
Benchmarking eight models reveals trade-offs between fidelity and profitability.
CTBench's evaluation framework effectively guides model selection for crypto analytics.
Models vary significantly across different market regimes.
Abstract
Synthetic time series are essential tools for data augmentation, stress testing, and algorithmic prototyping in quantitative finance. However, in cryptocurrency markets, characterized by 24/7 trading, extreme volatility, and rapid regime shifts, existing Time Series Generation (TSG) methods and benchmarks often fall short, jeopardizing practical utility. Most prior work (1) targets non-financial or traditional financial domains, (2) focuses narrowly on classification and forecasting while neglecting crypto-specific complexities, and (3) lacks critical financial evaluations, particularly for trading applications. To address these gaps, we introduce \textsf{CTBench}, the first comprehensive TSG benchmark tailored for the cryptocurrency domain. \textsf{CTBench} curates an open-source dataset from 452 tokens and evaluates TSG models across 13 metrics spanning 5 key dimensions: forecasting…
Peer Reviews
Decision·ICLR 2026 Poster
- Empirically, the authors observe the regime-dependent trade-offs between fidelity and tractability, i.e., high fidelity does not directly imply tradability. This is an important insight for generating synthetic financial data. - Emphasizing rank-based metrics and arbitrage capacity over purely generative scores moves the discussion toward decision-useful validation. - The authors offer useful guidance on model selection for practical uses.
- Results are tied to a single forecasting model. Without testing a variety of forecasters, it’s unclear whether conclusions generalize. - Insufficient diagnostic analysis of model performance differentials. The paper does not investigate why the TSG models diverge in outcomes, leaving the observations unexplained.
- This paper offers a valuable open-source cryptocurrency dataset, which bridges the gap between research on time series generation and cryptocurrency. - The benchmark discussed in this paper is comprehensive, involving most of recent deep generative models and evaluation metrics widely used in finance. - This paper provides some intriguing practical findings that are critical for future research and applications: (i) current generative models achieve diverse performance across tasks,
- The predictive utility task fixes the forecasting model to XGBoost for its robustness and interpretability. However, some conclusions may change with alternative predictors, given that different setups have large impacts on the model performance. - [Minor] Dual-task evaluation module in Figure 2 is visually unclear. Perhaps its readability can be improved by simplifying the contents like the other modules, since Figure 3 already shows the details.
1. Captures unique traits of crypto markets including 24/7 volatility, absence of intrinsic valuation and irregular liquidity that are not addressed in traditional benchmarks. 2. Links TSG to real-world use cases through two complementary tasks focusing on forecasting capabilities and tradable signal extraction rather than just statistical similarity. 3. Integrates 13 diverse metrics covering error measurement, rank correlation, trading performance, risk assessment and computational efficiency f
1. Relies solely on Binance’s spot hourly data, lacking the diversity of cross-exchange data, alternative crypto asset types such as futures contracts, and different sampling frequencies. Besides, it remains unknown whether this generation method can be effectively extended to a broader range of asset classes, which limits the method’s scalability. 2. All TSG models' generated data are used with the same XGBoost model for downstream prediction. Different TSG models may be better suited for diffe
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
