Read it in Two Steps: Translating Extremely Low-Resource Languages with Code-Augmented Grammar Books
Chen Zhang, Jiuheng Lin, Xiao Liu, Zekai Zhang, Yansong Feng

TL;DR
This paper explores how code-augmented grammar rules can improve translation of extremely low-resource languages by addressing rule retrieval and application challenges, leading to significant BLEU score gains.
Contribution
It introduces a novel code-based representation of grammar rules that enhances LLM reasoning and translation accuracy for low-resource languages.
Findings
Rule retrieval is a major bottleneck in grammar-based translation.
Code representation of rules improves retrieval and application.
Achieved 13.1% BLEU score improvement in experiments.
Abstract
While large language models (LLMs) have shown promise in translating extremely low-resource languages using resources like dictionaries, the effectiveness of grammar books remains debated. This paper investigates the role of grammar books in translating extremely low-resource languages by decomposing it into two key steps: grammar rule retrieval and application. To facilitate the study, we introduce ZhuangRules, a modularized dataset of grammar rules and their corresponding test sentences. Our analysis reveals that rule retrieval constitutes a primary bottleneck in grammar-based translation. Moreover, although LLMs can apply simple rules for translation when explicitly provided, they encounter difficulties in handling more complex rules. To address these challenges, we propose representing grammar rules as code functions, considering their similarities in structure and the benefit of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques
