EmoAra: Emotion-Preserving English Speech Transcription and Cross-Lingual Translation with Arabic Text-to-Speech

Besher Hassan; Ibrahim Alsarraj; Musaab Hasan; Yousef Melhim; Shahem Fadi; Shahem Sultan

arXiv:2602.01170·cs.CL·February 3, 2026

EmoAra: Emotion-Preserving English Speech Transcription and Cross-Lingual Translation with Arabic Text-to-Speech

Besher Hassan, Ibrahim Alsarraj, Musaab Hasan, Yousef Melhim, Shahem Fadi, Shahem Sultan

PDF

Open Access

TL;DR

EmoAra is an integrated pipeline that preserves emotional nuance in cross-lingual speech translation from English to Arabic, combining multiple AI components to maintain emotion and achieve high translation quality.

Contribution

The paper introduces EmoAra, a novel end-to-end system that preserves emotion in cross-lingual speech translation, integrating emotion recognition, transcription, translation, and speech synthesis.

Findings

01

Emotion classification F1-score of 94%

02

Translation BLEU score of 56, BERTScore F1 of 88.7%

03

Human evaluation score of 81% on banking translations

Abstract

This work presents EmoAra, an end-to-end emotion-preserving pipeline for cross-lingual spoken communication, motivated by banking customer service where emotional context affects service quality. EmoAra integrates Speech Emotion Recognition, Automatic Speech Recognition, Machine Translation, and Text-to-Speech to process English speech and deliver an Arabic spoken output while retaining emotional nuance. The system uses a CNN-based emotion classifier, Whisper for English transcription, a fine-tuned MarianMT model for English-to-Arabic translation, and MMS-TTS-Ara for Arabic speech synthesis. Experiments report an F1-score of 94% for emotion classification, translation performance of BLEU 56 and BERTScore F1 88.7%, and an average human evaluation score of 81% on banking-domain translations. The implementation and resources are available at the accompanying GitHub repository.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Speech Recognition and Synthesis