Detection of AI-Synthesized Speech Using Cepstral & Bispectral   Statistics

Arun Kumar Singh (1); Priyanka Singh (2) ((1) Indian Institute of; Technology Jammu; (2) Dhirubhai Ambani Institute of Information and; Communication Technology)

arXiv:2009.01934·cs.LG·April 13, 2021

Detection of AI-Synthesized Speech Using Cepstral & Bispectral Statistics

Arun Kumar Singh (1), Priyanka Singh (2) ((1) Indian Institute of, Technology Jammu, (2) Dhirubhai Ambani Institute of Information and, Communication Technology)

PDF

Open Access

TL;DR

This paper introduces a novel method combining bispectral and cepstral statistics with machine learning to effectively distinguish AI-synthesized speech from human speech, enhancing digital audio forensics.

Contribution

It presents a new approach that integrates bispectral and cepstral analysis with machine learning for improved detection of AI-generated speech.

Findings

01

Higher-order statistics show less correlation in human speech.

02

Cepstral analysis reveals a power component missing in synthesized speech.

03

The combined method improves detection accuracy.

Abstract

Digital technology has made possible unimaginable applications come true. It seems exciting to have a handful of tools for easy editing and manipulation, but it raises alarming concerns that can propagate as speech clones, duplicates, or maybe deep fakes. Validating the authenticity of a speech is one of the primary problems of digital audio forensics. We propose an approach to distinguish human speech from AI synthesized speech exploiting the Bi-spectral and Cepstral analysis. Higher-order statistics have less correlation for human speech in comparison to a synthesized speech. Also, Cepstral analysis revealed a durable power component in human speech that is missing for a synthesized speech. We integrate both these analyses and propose a machine learning model to detect AI synthesized speech.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Speech and Audio Processing