Prosody in Cascade and Direct Speech-to-Text Translation: a case study   on Korean Wh-Phrases

Giulio Zhou; Tsz Kin Lam; Alexandra Birch; Barry Haddow

arXiv:2402.00632·cs.CL·February 2, 2024·1 cites

Prosody in Cascade and Direct Speech-to-Text Translation: a case study on Korean Wh-Phrases

Giulio Zhou, Tsz Kin Lam, Alexandra Birch, Barry Haddow

PDF

Open Access

TL;DR

This study demonstrates that direct speech-to-text translation systems can effectively utilize prosodic cues, especially in Korean-English translation of wh-phrases, outperforming cascade models in disambiguating utterances.

Contribution

It provides the first quantitative evidence that direct S2TT models leverage prosody effectively, showing significant improvements over cascade systems in ambiguous cases.

Findings

01

12.9% improvement in overall accuracy for ambiguous cases

02

15.6% increase in F1 scores for key intent categories

03

First quantitative evidence of prosody use in direct S2TT models

Abstract

Speech-to-Text Translation (S2TT) has typically been addressed with cascade systems, where speech recognition systems generate a transcription that is subsequently passed to a translation model. While there has been a growing interest in developing direct speech translation systems to avoid propagating errors and losing non-verbal content, prior work in direct S2TT has struggled to conclusively establish the advantages of integrating the acoustic signal directly into the translation process. This work proposes using contrastive evaluation to quantitatively measure the ability of direct S2TT systems to disambiguate utterances where prosody plays a crucial role. Specifically, we evaluated Korean-English translation systems on a test set containing wh-phrases, for which prosodic features are necessary to produce translations with the correct intent, whether it's a statement, a yes/no…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsSparse Evolutionary Training