CAU_KU team's submission to ADD 2022 Challenge task 1: Low-quality fake   audio detection through frequency feature masking

Il-Youp Kwak; Sunmook Choi; Jonghoon Yang; Yerin Lee; Seungsang Oh

arXiv:2202.04328·cs.SD·February 10, 2022

CAU_KU team's submission to ADD 2022 Challenge task 1: Low-quality fake audio detection through frequency feature masking

Il-Youp Kwak, Sunmook Choi, Jonghoon Yang, Yerin Lee, Seungsang Oh

PDF

Open Access

TL;DR

This paper presents a frequency feature masking augmentation technique for detecting low-quality fake audio, achieving competitive results in the ADD 2022 Challenge by enhancing spectrogram-based models.

Contribution

Introduction of a frequency feature masking augmentation method to improve low-quality fake audio detection in spectrogram-based neural networks.

Findings

01

Achieved 23.8% EER on the ADD 2022 Challenge track 1.

02

Model ranked 3rd in the challenge.

03

Effective augmentation technique for low-quality audio detection.

Abstract

This technical report describes Chung-Ang University and Korea University (CAU_KU) team's model participating in the Audio Deep Synthesis Detection (ADD) 2022 Challenge, track 1: Low-quality fake audio detection. For track 1, we propose a frequency feature masking (FFM) augmentation technique to deal with a low-quality audio environment. %detection that spectrogram-based models can be applied. We applied FFM and mixup augmentation on five spectrogram-based deep neural network architectures that performed well for spoofing detection using mel-spectrogram and constant Q transform (CQT) features. Our best submission achieved 23.8% of EER ranked 3rd on track 1.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis