Enhancing Language Learning through Technology: Introducing a New English-Azerbaijani (Arabic Script) Parallel Corpus
Jalil Nourmohammadi Khiarak, Ammar Ahmadi, Taher Ak-bari Saeed, Meysam, Asgari-Chenaghlu, To\u{g}rul Atabay, Mohammad Reza Baghban Karimi, Ismail, Ceferli, Farzad Hasanvand, Seyed Mahboub Mousavi, Morteza Noshad

TL;DR
This paper presents a new large-scale English-Azerbaijani (Arabic Script) parallel corpus to improve machine translation and language learning for under-resourced Turkic languages, demonstrating its effectiveness in training neural MT systems.
Contribution
Introduces the first comprehensive English-Azerbaijani (Arabic Script) parallel corpus, facilitating advancements in neural machine translation and language education technology for low-resource languages.
Findings
Corpus effectively trains deep learning MT systems
Enhances NLP applications for under-resourced languages
Supports inclusive bilingual education and multilingual communication
Abstract
This paper introduces a pioneering English-Azerbaijani (Arabic Script) parallel corpus, designed to bridge the technological gap in language learning and machine translation (MT) for under-resourced languages. Consisting of 548,000 parallel sentences and approximately 9 million words per language, this dataset is derived from diverse sources such as news articles and holy texts, aiming to enhance natural language processing (NLP) applications and language education technology. This corpus marks a significant step forward in the realm of linguistic resources, particularly for Turkic languages, which have lagged in the neural machine translation (NMT) revolution. By presenting the first comprehensive case study for the English-Azerbaijani (Arabic Script) language pair, this work underscores the transformative potential of NMT in low-resource contexts. The development and utilization of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Linguistics and Cultural Studies · Lexicography and Language Studies
