Multi-Channel Replay Speech Detection using Acoustic Maps

Michael Neri; Tuomas Virtanen

arXiv:2602.16399·eess.AS·May 21, 2026

Multi-Channel Replay Speech Detection using Acoustic Maps

Michael Neri, Tuomas Virtanen

PDF

TL;DR

This paper introduces acoustic maps as a novel spatial feature for replay speech detection, leveraging multi-channel recordings and a lightweight neural network to improve security in voice verification systems.

Contribution

The work proposes a new acoustic map feature derived from beamforming for replay attack detection, demonstrating its effectiveness across various devices and environments.

Findings

01

Achieved competitive performance on the ReMASC dataset.

02

Acoustic maps provide a compact, interpretable feature space.

03

Effective across different devices and acoustic conditions.

Abstract

Replay attacks remain a critical vulnerability for automatic speaker verification systems, particularly in real-time voice assistant applications. In this work, we propose acoustic maps as a novel spatial feature representation for replay speech detection from multi-channel recordings. Derived from classical beamforming over discrete azimuth and elevation grids, acoustic maps encode directional energy distributions that reflect physical differences between human speech radiation and loudspeaker-based replay. A lightweight convolutional neural network is designed to operate on this representation, achieving competitive performance on the ReMASC dataset with approximately 6k trainable parameters. Experimental results show that acoustic maps provide a compact and physically interpretable feature space for replay attack detection across different devices and acoustic environments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Adversarial Robustness in Machine Learning · Speech and Audio Processing