TO-Rawnet: Improving RawNet with TCN and Orthogonal Regularization for   Fake Audio Detection

Chenglong Wang; Jiangyan Yi; Jianhua Tao; Chuyuan Zhang; Shuai Zhang,; Ruibo Fu; Xun Chen

arXiv:2305.13701·cs.SD·May 24, 2023·1 cites

TO-Rawnet: Improving RawNet with TCN and Orthogonal Regularization for Fake Audio Detection

Chenglong Wang, Jiangyan Yi, Jianhua Tao, Chuyuan Zhang, Shuai Zhang,, Ruibo Fu, Xun Chen

PDF

Open Access

TL;DR

This paper enhances RawNet for fake audio detection by integrating orthogonal regularization and TCN, significantly improving performance by reducing error rates on the ASVspoof 2019 dataset.

Contribution

It introduces orthogonal convolution and TCN into RawNet, optimizing filter independence and capturing long-term speech dependencies for better detection accuracy.

Findings

01

66.09% relative reduction in EER on logical access scenario

02

Effective in detecting fake audio attacks

03

Improved discriminability of features

Abstract

Current fake audio detection relies on hand-crafted features, which lose information during extraction. To overcome this, recent studies use direct feature extraction from raw audio signals. For example, RawNet is one of the representative works in end-to-end fake audio detection. However, existing work on RawNet does not optimize the parameters of the Sinc-conv during training, which limited its performance. In this paper, we propose to incorporate orthogonal convolution into RawNet, which reduces the correlation between filters when optimizing the parameters of Sinc-conv, thus improving discriminability. Additionally, we introduce temporal convolutional networks (TCN) to capture long-term dependencies in speech signals. Experiments on the ASVspoof 2019 show that the Our TO-RawNet system can relatively reduce EER by 66.09\% on logical access scenario compared with the RawNet,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Music and Audio Processing · Speech and Audio Processing