Crosstalk Suppression in a Multi-Channel, Multi-Speaker System Using Acoustic Vector Sensors

Grzegorz Szwoch

PMC · DOI:10.3390/s25216731·November 3, 2025

Crosstalk Suppression in a Multi-Channel, Multi-Speaker System Using Acoustic Vector Sensors

Grzegorz Szwoch

PDF

Open Access

TL;DR

This paper introduces a method to improve speech recognition in multi-speaker environments by suppressing crosstalk using acoustic vector sensors.

Contribution

A novel crosstalk suppression algorithm using acoustic vector sensors and source separation for multi-speaker speech recognition.

Findings

01

The algorithm achieved an SI-SDR improvement of 7.54 dB with source separation.

02

Without source separation, the SI-SDR improvement was 19.53 dB.

Abstract

Automatic speech recognition in a scenario with multiple speakers in a reverberant space, such as a small courtroom, often requires multiple sensors. This leads to a problem of crosstalk that must be removed before the speech-to-text transcription is performed. This paper presents an algorithm intended for application in multi-speaker scenarios requiring speech-to-text transcription, such as court sessions or conferences. The proposed method uses Acoustic Vector Sensors to acquire audio streams. Speaker detection is performed using statistical analysis of the direction of arrival. This information is then used to perform source separation. Next, speakers’ activity in each channel is analyzed, and signal fragments containing direct speech and crosstalk are identified. Crosstalk is then suppressed using a dynamic gain processor, and the resulting audio streams may be passed to a speech…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Diseases3

hallucinations Speech injury to

Figures9

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing