CSRD2025: A Large-Scale Synthetic Radio Dataset for Spectrum Sensing in Wireless Communications
Shuo Chang, Rui Sun, Jiashuo He, Sai Huang, Kan Yu, Zhiyong Feng

TL;DR
This paper introduces CSRD, a large-scale synthetic radio dataset generated by a modular simulation platform, to advance AI-based spectrum sensing in wireless communications by providing diverse, realistic RF data.
Contribution
The paper presents the CSRD framework for generating extensive synthetic RF datasets, including a benchmark dataset with over 25 million frames, to support AI research in spectrum sensing.
Findings
CSRD2025 dataset is 10,000 times larger than RML2018.
Includes diverse modulation schemes and realistic channel models.
Facilitates object detection in spectrum analysis.
Abstract
The development of Large AI Models (LAMs) for wireless communications, particularly for complex tasks like spectrum sensing, is critically dependent on the availability of vast, diverse, and realistic datasets. Addressing this need, this paper introduces the ChangShuoRadioData (CSRD) framework, an open-source, modular simulation platform designed for generating large-scale synthetic radio frequency (RF) data. CSRD simulates the end-to-end transmission and reception process, incorporating an extensive range of modulation schemes (100 types, including analog, digital, OFDM, and OTFS), configurable channel models featuring both statistical fading and site-specific ray tracing using OpenStreetMap data, and detailed modeling of realistic RF front-end impairments for various antenna configurations (SISO/MISO/MIMO). Using this framework, we characterize CSRD2025, a substantial dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
