Efficacy of ByT5 in Multilingual Translation of Biblical Texts for Underrepresented Languages
Corinne Aars, Lauren Adams, Xiaokan Tian, Zhaoyu Wang, Colton Wismer,, Jason Wu, Pablo Rivas, Korn Sooksatra, Matthew Fendt

TL;DR
This paper develops and evaluates a ByT5-based multilingual translation model specifically designed for translating the Bible into underrepresented languages, demonstrating its potential to improve access to sacred texts.
Contribution
It introduces a novel ByT5-based model tailored for biblical translation into low-resource languages, leveraging character-based encoding and a specialized corpus.
Findings
Model achieves improved BLEU scores on biblical translations
Handles complex biblical lexicon and structure effectively
Identifies limitations and future directions for model enhancement
Abstract
This study presents the development and evaluation of a ByT5-based multilingual translation model tailored for translating the Bible into underrepresented languages. Utilizing the comprehensive Johns Hopkins University Bible Corpus, we trained the model to capture the intricate nuances of character-based and morphologically rich languages. Our results, measured by the BLEU score and supplemented with sample translations, suggest the model can improve accessibility to sacred texts. It effectively handles the distinctive biblical lexicon and structure, thus bridging the linguistic divide. The study also discusses the model's limitations and suggests pathways for future enhancements, focusing on expanding access to sacred literature across linguistic boundaries.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
