A Neural Vocoder Based Packet Loss Concealment Algorithm
Yao Zhou, Changchun Bao

TL;DR
This paper presents a neural vocoder-based packet loss concealment method for VoIP that uses Mel-spectrogram prediction and a smoothing process to improve speech naturalness and reduce artifacts.
Contribution
It introduces a portable, receiver-based packet loss concealment algorithm utilizing a flow-based neural vocoder and a novel smoothing post-process.
Findings
Outperforms traditional concealment methods in speech naturalness
Effectively reduces discontinuities and artifacts in reconstructed speech
Demonstrates robustness in various packet loss scenarios
Abstract
The packet loss problem seriously affects the quality of service in Voice over IP (VoIP) sceneries. In this paper, we investigated online receiver-based packet loss concealment which is much more portable and applicable. For ensuring the speech naturalness, rather than directly processing time-domain waveforms or separately reconstructing amplitudes and phases in frequency domain, a flow-based neural vocoder is adopted to generate the substitution waveform of lost packet from Mel-spectrogram which is generated from history contents by a well-designed neural predictor. Furthermore, a waveform similarity-based smoothing post-process is created to mitigate the discontinuity of speech and avoid the artifacts. The experimental results show the outstanding performance of the proposed method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Data Compression Techniques
Methodstravel james
