HCFD: A Benchmark for Audio Deepfake Detection in Healthcare
Mohd Mujtaba Akhtar, Girish, Muskaan Singh

TL;DR
This paper introduces HCFD, a benchmark for detecting audio deepfakes in healthcare, proposing a new dataset, models, and a geometry-aware framework that outperform existing methods across clinical conditions.
Contribution
The study presents the first pathology-aware dataset for codec-fake detection, introduces a novel hyperbolic space model, and demonstrates superior performance over existing methods.
Findings
PaSST outperforms existing speech models for HCFD.
PHOENIX-Mamba achieves over 97% accuracy on multiple clinical conditions.
Geometry-aware modeling improves robustness in pathological speech detection.
Abstract
In this study, we present Healthcare Codec-Fake Detection (HCFD), a new task for detecting codec-fakes under pathological speech conditions. We intentionally focus on codec based synthetic speech in this work, since neural codec decoding forms a core building block in modern speech generation pipelines. First, we release Healthcare CodecFake, the first pathology-aware dataset containing paired real and NAC-synthesized speech across multipl clinical conditions and codec families. Our evaluations show that SOTA codec-fake detectors trained primarily on healthy speech perform poorly on Healthcare CodecFake, highlighting the need for HCFD-specific models. Second, we demonstrate that PaSST outperforms existing speech-based models for HCFD, benefiting from its patch-based spectro-temporal representation. Finally, we propose PHOENIX-Mamba, a geometry-aware framework that models codec-fakes as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
