Mitigating Data Scarcity in Time Series Analysis: A Foundation Model with Series-Symbol Data Generation
Wenxuan Wang, Kai Wu, Yujian Betterest Li, Dan Wang, Xiaoyu Zhang,, Jing Liu

TL;DR
This paper introduces SymTime, a foundation model for time series analysis that uses a novel series-symbol data generation method to overcome data scarcity, achieving competitive results across multiple tasks.
Contribution
The paper presents a dual-modality data generation mechanism and a pre-trained foundation model, SymTime, for improved time series analysis under data scarcity conditions.
Findings
SymTime performs competitively on five TSA tasks.
The series-symbol data generation enhances data diversity and quality.
Pretraining on generated data rivals real-world dataset pretraining.
Abstract
Foundation models for time series analysis (TSA) have attracted significant attention. However, challenges such as data scarcity and data imbalance continue to hinder their development. To address this, we consider modeling complex systems through symbolic expressions that serve as semantic descriptors of time series. Building on this concept, we introduce a series-symbol (S2) dual-modulity data generation mechanism, enabling the unrestricted creation of high-quality time series data paired with corresponding symbolic representations. Leveraging the S2 dataset, we develop SymTime, a pre-trained foundation model for TSA. SymTime demonstrates competitive performance across five major TSA tasks when fine-tuned with downstream task, rivaling foundation models pre-trained on real-world datasets. This approach underscores the potential of dual-modality data generation and pretraining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
