Training Data Augmentation for Deep Learning Radio Frequency Systems
William H. Clark IV, Steven Hauser, William C. Headley, and Alan J., Michaels

TL;DR
This paper investigates the impact of synthetic, captured, and augmented data on deep learning performance in RF applications, emphasizing the importance of real data and the trade-offs involved.
Contribution
It provides a systematic analysis of data types in RFML, quantifies the balance between real and synthetic data, and offers insights for optimizing training data strategies.
Findings
Captured data yields the highest performance gains.
Synthetic data can be effective when real data is scarce.
Augmentation enhances model robustness and performance.
Abstract
Applications of machine learning are subject to three major components that contribute to the final performance metrics. Within the category of neural networks, and deep learning specifically, the first two are the architecture for the model being trained and the training approach used. This work focuses on the third component, the data used during training. The primary questions that arise are ``what is in the data'' and ``what within the data matters?'' Looking into the Radio Frequency Machine Learning (RFML) field of Automatic Modulation Classification (AMC) as an example of a tool used for situational awareness, the use of synthetic, captured, and augmented data are examined and compared to provide insights about the quantity and quality of the available data necessary to achieve desired performance levels. There are three questions discussed within this work: (1) how useful a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
