Usefulness of Emotional Prosody in Neural Machine Translation
Charles Brazier, Jean-Luc Rouas

TL;DR
This paper explores enhancing neural machine translation by incorporating automatically recognized emotional cues from speech, demonstrating that emotional information, especially arousal, improves translation quality.
Contribution
It introduces a novel two-stage method combining speech emotion recognition with NMT to leverage emotional cues for better translation accuracy.
Findings
Emotion integration improves translation quality
Arousal has the most significant impact
Method outperforms baseline NMT models
Abstract
Neural Machine Translation (NMT) is the task of translating a text from one language to another with the use of a trained neural network. Several existing works aim at incorporating external information into NMT models to improve or control predicted translations (e.g. sentiment, politeness, gender). In this work, we propose to improve translation quality by adding another external source of information: the automatically recognized emotion in the voice. This work is motivated by the assumption that each emotion is associated with a specific lexicon that can overlap between emotions. Our proposed method follows a two-stage procedure. At first, we select a state-of-the-art Speech Emotion Recognition (SER) model to predict dimensional emotion values from all input audio in the dataset. Then, we use these predicted emotions as source tokens added at the beginning of input texts to train…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
