Analysis of Domain Shift across ASR Architectures via TTS-Enabled Separation of Target Domain and Acoustic Conditions

Tina Raissi; Nick Rossenbach; Ralf Schl\"uter

arXiv:2508.09868·cs.SD·August 14, 2025

Analysis of Domain Shift across ASR Architectures via TTS-Enabled Separation of Target Domain and Acoustic Conditions

Tina Raissi, Nick Rossenbach, Ralf Schl\"uter

PDF

TL;DR

This paper investigates how different ASR architectures and modeling choices perform under domain mismatch, using TTS-generated target domain data to isolate language effects and assess generalization.

Contribution

It provides the first controlled comparison of various ASR architectures under domain shift, highlighting the impact of specific modeling choices on performance.

Findings

01

Modeling choices significantly influence ASR performance under domain shift.

02

Seq2seq and modular architectures show similar robustness when optimized.

03

Target domain adaptation improves recognition without retraining acoustic models.

Abstract

We analyze automatic speech recognition (ASR) modeling choices under domain mismatch, comparing classic modular and novel sequence-to-sequence (seq2seq) architectures. Across the different ASR architectures, we examine a spectrum of modeling choices, including label units, context length, and topology. To isolate language domain effects from acoustic variation, we synthesize target domain audio using a text-to-speech system trained on LibriSpeech. We incorporate target domain n-gram and neural language models for domain adaptation without retraining the acoustic model. To our knowledge, this is the first controlled comparison of optimized ASR systems across state-of-the-art architectures under domain shift, offering insights into their generalization. The results show that, under domain shift, rather than the decoder architecture choice or the distinction between classic modular and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.