Benchmarking Azerbaijani Neural Machine Translation
Chih-Chen Chen, William Chen

TL;DR
This paper evaluates Azerbaijani-English neural machine translation, comparing segmentation techniques and domain performance, revealing that Unigram segmentation enhances results and dataset quality impacts model scaling, but cross-domain translation remains difficult.
Contribution
It provides the first comprehensive benchmark of Azerbaijani NMT, analyzing segmentation methods and domain generalization, highlighting key factors affecting translation quality.
Findings
Unigram segmentation improves NMT performance
Model scaling benefits more from data quality than quantity
Cross-domain generalization remains a challenge
Abstract
Little research has been done on Neural Machine Translation (NMT) for Azerbaijani. In this paper, we benchmark the performance of Azerbaijani-English NMT systems on a range of techniques and datasets. We evaluate which segmentation techniques work best on Azerbaijani translation and benchmark the performance of Azerbaijani NMT models across several domains of text. Our results show that while Unigram segmentation improves NMT performance and Azerbaijani translation models scale better with dataset quality than quantity, cross-domain generalization remains a challenge
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsUnigram Segmentation
