Spectral-Aware Text-to-Time Series Generation with Billion-Scale Multimodal Meteorological Data
Shijie Zhang

TL;DR
This paper introduces a new large-scale meteorological dataset and a spectral-aware diffusion model for text-guided weather time-series generation, achieving state-of-the-art results and strong semantic control.
Contribution
The work presents MeteoCap-3B, a billion-scale weather dataset with expert captions, and MTransformer, a spectral-aware diffusion model for precise text-to-weather time-series synthesis.
Findings
State-of-the-art generation quality on real-world benchmarks
Accurate cross-modal alignment between text and weather signals
Enhanced semantic controllability and improved forecasting in data-sparse scenarios
Abstract
Text-to-time-series generation is particularly important in meteorology, where natural language offers intuitive control over complex, multi-scale atmospheric dynamics. Existing approaches are constrained by the lack of large-scale, physically grounded multimodal datasets and by architectures that overlook the spectral-temporal structure of weather signals. We address these challenges with a unified framework for text-guided meteorological time-series generation. First, we introduce MeteoCap-3B, a billion-scale weather dataset paired with expert-level captions constructed via a Multi-agent Collaborative Captioning (MACC) pipeline, yielding information-dense and physically consistent annotations. Building on this dataset, we propose MTransformer, a diffusion-based model that enables precise semantic control by mapping textual descriptions into multi-band spectral priors through a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
