ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs
Ahmed Heakl, Youssef Zaghloul, Mennatullah Ali, Rania Hossam, Walid, Gomaa

TL;DR
This paper develops and evaluates machine translation and speech recognition systems for code-switched Egyptian Arabic-English, leveraging large language models and achieving significant improvements over existing methods.
Contribution
It introduces novel methodologies for translating and recognizing code-switched Egyptian Arabic-English using LLMs and Whisper, addressing resource limitations and dialect-specific challenges.
Findings
56% improvement in English translation accuracy
9.3% improvement in Arabic translation accuracy
Effective handling of code-switching in speech recognition
Abstract
Motivated by the widespread increase in the phenomenon of code-switching between Egyptian Arabic and English in recent times, this paper explores the intricacies of machine translation (MT) and automatic speech recognition (ASR) systems, focusing on translating code-switched Egyptian Arabic-English to either English or Egyptian Arabic. Our goal is to present the methodologies employed in developing these systems, utilizing large language models such as LLama and Gemma. In the field of ASR, we explore the utilization of the Whisper model for code-switched Egyptian Arabic recognition, detailing our experimental procedures including data preprocessing and training techniques. Through the implementation of a consecutive speech-to-text translation system that integrates ASR with MT, we aim to overcome challenges posed by limited resources and the unique characteristics of the Egyptian Arabic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗ahmedheakl/arazn-gemma1.1-2B-arabicmodel· ♡ 1♡ 1
- 🤗ahmedheakl/arazn-gemma1.1-2B-eng-extramodel
- 🤗ahmedheakl/arazn-gemma1.1-7B-engmodel
- 🤗ahmedheakl/arazn-gemma1.1-7B-eng-1model
- 🤗ahmedheakl/arazn-whisper-smallmodel· 15 dl· ♡ 215 dl♡ 2
- 🤗ahmedheakl/arazn-llama3-englishmodel· ♡ 2♡ 2
- 🤗ahmedheakl/arazn-llama3-arabicmodel· ♡ 1♡ 1
- 🤗ahmedheakl/arazn-gemma1.1-7B-arabicmodel· ♡ 1♡ 1
- 🤗ahmedheakl/arazn-gemma1.1-7B-eng-extramodel
- 🤗ahmedheakl/arazn-whisper-mediummodel· 38 dl38 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Lexicography and Language Studies
MethodsLLaMA
