Conditional Semi-Supervised Data Augmentation for Spam Message Detection with Low Resource Data
Ulin Nuha, Chih-Hsueh Lin

TL;DR
This paper introduces CSSDA, a semi-supervised data augmentation method that leverages unlabeled data to improve spam message detection, especially in low-resource scenarios, achieving robust and balanced accuracy.
Contribution
The paper presents a novel conditional semi-supervised data augmentation framework that effectively utilizes unlabeled data for spam detection with limited labeled data.
Findings
CSSDA achieves about 85% balanced accuracy with limited labeled data.
Unlabeled data significantly enhances data augmentation and model robustness.
Ablation studies confirm the effectiveness of the proposed scheme.
Abstract
Several machine learning schemes have attempted to perform the detection of spam messages. However, those schemes mostly require a huge amount of labeled data. The existing techniques addressing the lack of data availability have issues with effectiveness and robustness. Therefore, this paper proposes a conditional semi-supervised data augmentation (CSSDA) for a spam detection model lacking the availability of data. The main architecture of CSSDA comprises feature extraction and enhanced generative network. Here, we exploit unlabeled data for data augmentation to extend training data. The enhanced generative in our proposed scheme produces latent variables as fake samples from unlabeled data through a conditional scheme. Latent variables can come from labeled and unlabeled data as the input for the final classifier in our spam detection model. The experimental results indicate that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNetwork Security and Intrusion Detection · Spam and Phishing Detection · Internet Traffic Analysis and Secure E-voting
