Sign-to-Speech Prosody Transfer via Sign Reconstruction-based GAN
Toranosuke Manabe, and Yuto Shibata, and Shinnosuke Takamichi, and Yoshimitsu Aoki

TL;DR
This paper introduces a novel approach for directly transferring prosody from sign language to speech synthesis, utilizing a GAN-based framework and a new model architecture to improve naturalness.
Contribution
It proposes SignRecGAN and S2PFormer, enabling sign-to-speech prosody transfer without requiring costly cross-modal annotations.
Findings
Synthesized speech reflects sign language emotional nuances.
The method outperforms baseline models in prosody transfer fidelity.
Code will be publicly available upon acceptance.
Abstract
Deep learning models have improved sign language-to-text translation and made it easier for non-signers to understand signed messages. When the goal is spoken communication, a naive approach is to convert signed messages into text and then synthesize speech via Text-to-Speech (TTS). However, this two-stage pipeline inevitably treat text as a bottleneck representation, causing the loss of rich non-verbal information originally conveyed in the signing. To address this limitation, we propose a novel task, \emph{Sign-to-Speech Prosody Transfer}, which aims to capture the global prosodic nuances expressed in sign language and directly integrate them into synthesized speech. A major challenge is that aligning sign and speech requires expert knowledge, making annotation extremely costly and preventing the construction of large parallel corpora. To overcome this, we introduce \emph{SignRecGAN},…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
