Hidden-Markov-Model Based Speech Enhancement
Daniel Dzibela, Armin Sehr

TL;DR
This paper presents a Hidden Markov Model-based speech enhancement method that synthesizes speech using side information like pitch to improve quality and intelligibility, outperforming text-only synthesis.
Contribution
It introduces a novel approach combining HMM-based synthesis with side information to enhance speech quality and intelligibility in noisy recordings.
Findings
Synthesized speech quality improves with side information use.
Nearly indistinguishable speech synthesis achieved with pitch data.
Models maintain intelligibility despite robotic sound quality.
Abstract
The goal of this contribution is to use a parametric speech synthesis system for reducing background noise and other interferences from recorded speech signals. In a first step, Hidden Markov Models of the synthesis system are trained. Two adequate training corpora consisting of text and corresponding speech files have been set up and cleared of various faults, including inaudible utterances or incorrect assignments between audio and text data. Those are tested and compared against each other regarding e.g. flaws in the synthesized speech, it's naturalness and intelligibility. Thus different voices have been synthesized, whose quality depends less on the number of training samples used, but much more on the cleanliness and signal-to-noise ratio of those. Generalized voice models have been used for synthesis and the results greatly differ between the two speech corpora. Tests…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research
