Detection of AI-Synthesized Speech Using Cepstral & Bispectral Statistics
Arun Kumar Singh (1), Priyanka Singh (2) ((1) Indian Institute of, Technology Jammu, (2) Dhirubhai Ambani Institute of Information and, Communication Technology)

TL;DR
This paper introduces a novel method combining bispectral and cepstral statistics with machine learning to effectively distinguish AI-synthesized speech from human speech, enhancing digital audio forensics.
Contribution
It presents a new approach that integrates bispectral and cepstral analysis with machine learning for improved detection of AI-generated speech.
Findings
Higher-order statistics show less correlation in human speech.
Cepstral analysis reveals a power component missing in synthesized speech.
The combined method improves detection accuracy.
Abstract
Digital technology has made possible unimaginable applications come true. It seems exciting to have a handful of tools for easy editing and manipulation, but it raises alarming concerns that can propagate as speech clones, duplicates, or maybe deep fakes. Validating the authenticity of a speech is one of the primary problems of digital audio forensics. We propose an approach to distinguish human speech from AI synthesized speech exploiting the Bi-spectral and Cepstral analysis. Higher-order statistics have less correlation for human speech in comparison to a synthesized speech. Also, Cepstral analysis revealed a durable power component in human speech that is missing for a synthesized speech. We integrate both these analyses and propose a machine learning model to detect AI synthesized speech.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Speech and Audio Processing
