# Singing to speech conversion with generative flow

**Authors:** Jiawen Huang, Emmanouil Benetos

PMC · DOI: 10.1186/s13636-025-00400-x · Eurasip Journal on Audio, Speech, and Music Processing · 2025-03-10

## TL;DR

This paper introduces a system to convert singing into speech, preserving phonetic information while reducing musical elements like pitch and rhythm.

## Contribution

The paper presents the first deep learning-based system for singing to speech conversion using generative flow.

## Key findings

- The proposed model outperforms signal processing baselines in naturalness.
- It achieves higher phonetic similarity compared to a transcribe-and-synthesize baseline.
- Singing-to-speech conversion is shown to be effective for augmenting low-resource lyrics transcription.

## Abstract

This paper introduces singing to speech conversion (S2S), a cross-domain voice conversion task, and presents the first deep learning-based S2S system. S2S aims to transform singing into speech while retaining the phonetic information, reducing variations in pitch, rhythm, and timbre. Inspired by the Glow-TTS architecture, the proposed model is built using generative flow, with an adjusted alignment module between the latent features. We adapt the original monotonic alignment search (MAS) to the S2S scenario and utilize a duration predictor to deal with the duration differences between the two modalities. Subjective evaluations show that the proposed model outperforms signal processing baselines in naturalness and outperforms a transcribe-and-synthesize baseline in phonetic similarity to the original singing. We further demonstrate that singing-to-speech could be an effective augmentation method for low-resource lyrics transcription.

## Full-text entities

- **Diseases:** hearing loss (MESH:D034381), WORLD (MESH:D016773), SRMR (MESH:D013064), ALT (MESH:C537069)
- **Chemicals:** CTC (-), M2 (MESH:C034584)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11893632/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11893632/full.md

## References

49 references — full list in the complete paper: https://tomesphere.com/paper/PMC11893632/full.md

---
Source: https://tomesphere.com/paper/PMC11893632