Real time spectrogram inversion on mobile phone
Oleg Rybakov, Marco Tagliasacchi, Yunpeng Li, Liyang Jiang, Xia Zhang,, Fadi Biadsy

TL;DR
This paper compares two real-time spectrogram inversion methods, streaming Griffin Lim and streaming MelGAN, analyzing their perceptual quality, latency, and resource usage on mobile devices, with insights into lookahead benefits.
Contribution
It introduces streaming versions of Griffin Lim and MelGAN for real-time spectrogram inversion on mobile phones, highlighting trade-offs and conditions for comparable quality.
Findings
Lookahead improves MelGAN perceptual quality significantly.
Streaming Griffin Lim is faster and uses less memory than MelGAN.
Both methods perform well on noisy speech without mel transformation.
Abstract
We present two methods of real time magnitude spectrogram inversion: streaming Griffin Lim(GL) and streaming MelGAN. We demonstrate the impact of looking ahead on perceptual quality of MelGAN. As little as one hop size (12.5ms) of lookahead is able to significantly improve perceptual quality in comparison to its causal version. We compare streaming GL with the streaming MelGAN and show different trade-offs in terms of perceptual quality, on-device latency, algorithmic delay, memory footprint and noise sensitivity. For fair quality assessment of the GL approach, we use input log magnitude spectrogram without mel transformation. We evaluate presented real time spectrogram inversion approaches on clean, noisy and atypical speech. We specified conditions when streaming GL has comparable quality with MelGAN: noisy audio and no mel transformation. Streaming GL is 2.4x faster than real time on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Blind Source Separation Techniques · Advanced Adaptive Filtering Techniques
Methods1x1 Convolution · Dilated Convolution · Residual Connection · Grouped Convolution · Window-based Discriminator · Weight Normalization · Convolution · MelGAN Residual Block · GAN Hinge Loss · Tanh Activation
