GAFX: A General Audio Feature eXtractor
Zhaoyang Bu, Hanhaodi Zhang, Xiaohu Zhu

TL;DR
This paper introduces GAFX, a versatile deep learning-based audio feature extractor that replaces traditional spectrograms, demonstrating competitive performance in music genre classification.
Contribution
The paper proposes GAFX, a novel framework combining U-Net, ResNet, and Attention modules for learnable audio feature extraction, advancing beyond handcrafted spectrogram features.
Findings
GAFX achieves competitive results on GTZAN dataset.
Deep learning features can effectively replace spectrograms.
Ablation studies identify optimal configurations for GAFX.
Abstract
Most machine learning models for audio tasks are dealing with a handcrafted feature, the spectrogram. However, it is still unknown whether the spectrogram could be replaced with deep learning based features. In this paper, we answer this question by comparing the different learnable neural networks extracting features with a successful spectrogram model and proposed a General Audio Feature eXtractor (GAFX) based on a dual U-Net (GAFX-U), ResNet (GAFX-R), and Attention (GAFX-A) modules. We design experiments to evaluate this model on the music genre classification task on the GTZAN dataset and perform a detailed ablation study of different configurations of our framework and our model GAFX-U, following the Audio Spectrogram Transformer (AST) classifier achieves competitive performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
MethodsMulti-Head Attention · Attention Is All You Need · *Communicated@Fast*How Do I Communicate to Expedia? · Linear Layer · Average Pooling · Absolute Position Encodings · Dropout · Adam · Byte Pair Encoding · Concatenated Skip Connection
