Comparison of semi-supervised deep learning algorithms for audio classification
L\'eo Cances, Etienne Labb\'e, Thomas Pellegrini

TL;DR
This study compares five semi-supervised deep learning algorithms for audio classification, demonstrating that data augmentation techniques like MixMatch and ReMixMatch significantly improve accuracy on standard benchmarks.
Contribution
It adapts and evaluates five recent SSL methods on audio classification tasks, highlighting the effectiveness of data augmentation strategies like mixup.
Findings
MixMatch and ReMixMatch outperform other methods on most datasets.
Data augmentation with mixup consistently improves error rates.
Some SSL methods outperform models trained with all labeled data.
Abstract
In this article, we adapted five recent SSL methods to the task of audio classification. The first two methods, namely Deep Co-Training (DCT) and Mean Teacher (MT), involve two collaborative neural networks. The three other algorithms, called MixMatch (MM), ReMixMatch (RMM), and FixMatch (FM), are single-model methods that rely primarily on data augmentation strategies. Using the Wide-ResNet-28-2 architecture in all our experiments, 10% of labeled data and the remaining 90% as unlabeled data for training, we first compare the error rates of the five methods on three standard benchmark audio datasets: Environmental Sound Classification (ESC-10), UrbanSound8K (UBS8K), and Google Speech Commands (GSC). In all but one cases, MM, RMM, and FM outperformed MT and DCT significantly, MM and RMM being the best methods in most experiments. On UBS8K and GSC, MM achieved 18.02% and 3.25% error rate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Diverse Musicological Studies
MethodsFixMatch · Residual Connection · 1x1 Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Bottleneck Residual Block · Residual Block · Average Pooling · Max Pooling · Kaiming Initialization
