Real-Time Streamable Generative Speech Restoration with Flow Matching

Simon Welker; Bunlong Lay; Maris Hillemann; Tal Peer; Timo Gerkmann

arXiv:2512.19442·eess.SP·April 22, 2026

Real-Time Streamable Generative Speech Restoration with Flow Matching

Simon Welker, Bunlong Lay, Maris Hillemann, Tal Peer, Timo Gerkmann

PDF

TL;DR

This paper introduces Stream.FM, a real-time, flow-based generative speech restoration model with low latency suitable for consumer GPUs, advancing streaming speech processing capabilities.

Contribution

The paper presents a novel low-latency, streaming-compatible flow-based model for speech restoration, including optimized architecture, inference scheme, and model compression techniques.

Findings

01

Stream.FM achieves 48 ms total latency for real-time speech processing.

02

It outperforms previous diffusion-based models in streaming speech enhancement.

03

High-quality speech restoration is feasible on consumer GPUs with the proposed methods.

Abstract

Diffusion-based generative models have greatly impacted the speech processing field in recent years, exhibiting high speech naturalness and spawning a new research direction. Their application in real-time communication is, however, still lagging behind due to their computation-heavy nature involving multiple calls of large DNNs. Here, we present Stream $.$ FM, a frame-causal flow-based generative model with an algorithmic latency of 32 milliseconds (ms) and a total latency of 48 ms, paving the way for generative speech processing in real-time communication. We propose a buffered streaming inference scheme and an optimized DNN architecture, show how learned few-step numerical solvers can boost output quality at a fixed compute budget, explore model weight compression to find favorable points along a compute/quality tradeoff, and contribute a model variant with 24 ms total latency for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.