Revealing the Role of Audio Channels in ASR Performance Degradation

Kuan-Tang Huang; Li-Wei Chen; Hung-Shin Lee; Berlin Chen; Hsin-Min Wang

arXiv:2508.08967·cs.SD·August 25, 2025

Revealing the Role of Audio Channels in ASR Performance Degradation

Kuan-Tang Huang, Li-Wei Chen, Hung-Shin Lee, Berlin Chen, Hsin-Min Wang

PDF

Open Access

TL;DR

This paper investigates how different audio recording channels affect ASR performance and introduces a normalization method that aligns feature representations to improve robustness across channels and languages.

Contribution

The study identifies channel variation as a key factor in ASR degradation and proposes a normalization technique to mitigate this issue, enhancing cross-channel and cross-language robustness.

Findings

01

Normalization improves ASR accuracy on unseen channels

02

Method generalizes well across different languages

03

Significant performance gains over baseline models

Abstract

Pre-trained automatic speech recognition (ASR) models have demonstrated strong performance on a variety of tasks. However, their performance can degrade substantially when the input audio comes from different recording channels. While previous studies have demonstrated this phenomenon, it is often attributed to the mismatch between training and testing corpora. This study argues that variations in speech characteristics caused by different recording channels can fundamentally harm ASR performance. To address this limitation, we propose a normalization technique designed to mitigate the impact of channel variation by aligning internal feature representations in the ASR model with those derived from a clean reference channel. This approach significantly improves ASR performance on previously unseen channels and languages, highlighting its ability to generalize across channel and language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Voice and Speech Disorders