VoxAnchor: Grounding Speech Authenticity in Throat Vibration via mmWave Radar

Mingda Han; Huanqi Yang; Chaoqun Li; Wenhao Li; Guoming Zhang; Yanni Yang; Yetong Cao; Weitao Xu; Pengfei Hu

arXiv:2603.27562·cs.HC·March 31, 2026

VoxAnchor: Grounding Speech Authenticity in Throat Vibration via mmWave Radar

Mingda Han, Huanqi Yang, Chaoqun Li, Wenhao Li, Guoming Zhang, Yanni Yang, Yetong Cao, Weitao Xu, Pengfei Hu

PDF

TL;DR

VoxAnchor leverages millimeter-wave radar to detect speech forgeries by analyzing throat vibrations, providing a physiologically grounded, fine-grained authentication method that outperforms existing techniques.

Contribution

The paper introduces VoxAnchor, a novel system that physically grounds speech authentication in throat vibrations using radar, enabling robust, word-level forgery detection.

Findings

01

Achieves an overall EER of 0.017 in forgery detection.

02

Effectively detects diverse forgeries including editing, splicing, replay, and deepfake.

03

Operates with low latency and modest computational cost.

Abstract

Rapid advances in speech synthesis and audio editing have made realistic forgeries increasingly accessible, yet existing detection methods remain vulnerable to tampering or depend on visual/wearable sensors. In this paper, we present VoxAnchor, a system that physically grounds audio authentication in vocal dynamics by leveraging the inherent coherence between speech acoustics and radar-sensed throat vibrations. VoxAnchor uses contactless millimeter-wave radar to capture fine-grained throat vibrations that are tightly coupled with human speech production, establishing a hard-to-forge anchor rooted in human physiology. The design comprises three main components: (1) a cross-modal frame-work that uses modality-specific encoders and contrastive learning to detect subtle mismatches at word granularity; (2) a phase-aware pipeline that extracts physically consistent, temporally faithful throat…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.