AI-Driven Generation of Old English: A Framework for Low-Resource Languages
Rodrigo Gabriel Salazar Alva, Mat\'ias Nu\~nez, Cristian L\'opez, Javier Mart\'in Arista

TL;DR
This paper introduces a scalable AI framework that leverages large language models and innovative techniques to generate high-quality Old English texts, aiding in the preservation of this endangered language.
Contribution
It presents a novel combination of parameter-efficient fine-tuning, data augmentation, and a dual-agent pipeline for Old English text generation, advancing low-resource language NLP.
Findings
BLEU scores increased from 26 to over 65
High grammatical accuracy confirmed by experts
Method effectively expands Old English corpus
Abstract
Preserving ancient languages is essential for understanding humanity's cultural and linguistic heritage, yet Old English remains critically under-resourced, limiting its accessibility to modern natural language processing (NLP) techniques. We present a scalable framework that uses advanced large language models (LLMs) to generate high-quality Old English texts, addressing this gap. Our approach combines parameter-efficient fine-tuning (Low-Rank Adaptation, LoRA), data augmentation via backtranslation, and a dual-agent pipeline that separates the tasks of content generation (in English) and translation (into Old English). Evaluation with automated metrics (BLEU, METEOR, and CHRF) shows significant improvements over baseline models, with BLEU scores increasing from 26 to over 65 for English-to-Old English translation. Expert human assessment also confirms high grammatical accuracy and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
