From prompting to evidence-based translation: A RAG+prompt system for Japanese-Chinese translation and its pedagogical potential
Wenshi Gu

TL;DR
This paper presents a retrieval-augmented generation system that enhances Japanese-Chinese translation, especially for complex noun-modifying clauses, by integrating linguistic analysis, retrieval, and prompt engineering, showing significant BLEU score improvements.
Contribution
The study introduces a novel RAG+Prompt translation system that improves translation quality for Japanese-Chinese pairs by combining linguistic analysis with retrieval-augmented prompts without modifying the base language model.
Findings
BLEU score increased from 24.28 to 29.96 with larger knowledge bases.
Larger knowledge bases consistently improved translation performance.
The system provides interpretable and auditable translation improvements.
Abstract
Large language models perform well on high-resource pairs but are less reliable for Japanese-Chinese sentences containing noun-modifying clause constructions (NMCCs). This study evaluates a retrieval-augmented generation RAG+Prompt translation system that integrates linguistic analysis, embedding-based retrieval, prompt construction, and LLM generation without modifying the base model. The analysis module outputs A1 (inner vs. outer NMCC) and A2 (risk predictions: lexical choice/NMCC handling/word order/style/register); top-k = 5 similar Ja-Zh examples (L2 distance) and A1/A2 are inserted into an enhanced prompt. Using GPT-4o and a 66-sentence test set, we compare six knowledge-base sizes (0/100/200/500/1,000/2,000). Macro-averaged sentence-level BLEU (1-4-gram with brevity penalty; cased; Chinese at the character level) is the sole metric. Mean BLEU increases from 24.28 at 0 (RAG…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
