Towards Attention-based Contrastive Learning for Audio Spoof Detection

Chirag Goel; Surya Koppisetti; Ben Colman; Ali Shahriyari; Gaurav; Bharaj

arXiv:2407.03514·cs.SD·July 8, 2024

Towards Attention-based Contrastive Learning for Audio Spoof Detection

Chirag Goel, Surya Koppisetti, Ben Colman, Ali Shahriyari, Gaurav, Bharaj

PDF

TL;DR

This paper introduces a novel attention-based contrastive learning framework using Vision Transformers for audio spoof detection, significantly improving classification performance and disentangling genuine and spoofed audio classes.

Contribution

The paper pioneers the application of Vision Transformers with contrastive learning for audio spoof detection, enhancing representation learning and classification accuracy.

Findings

01

Achieved competitive performance on ASVSpoof 2021 challenge

02

Disentangled bonafide and spoof classes effectively

03

Improved EERs with the proposed framework

Abstract

Vision transformers (ViT) have made substantial progress for classification tasks in computer vision. Recently, Gong et. al. '21, introduced attention-based modeling for several audio tasks. However, relatively unexplored is the use of a ViT for audio spoof detection task. We bridge this gap and introduce ViTs for this task. A vanilla baseline built on fine-tuning the SSAST (Gong et. al. '22) audio ViT model achieves sub-optimal equal error rates (EERs). To improve performance, we propose a novel attention-based contrastive learning framework (SSAST-CL) that uses cross-attention to aid the representation learning. Experiments show that our framework successfully disentangles the bonafide and spoof classes and helps learn better classifiers for the task. With appropriate data augmentations policy, a model trained on our framework achieves competitive performance on the ASVSpoof 2021…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Learning