Translation via Annotation: A Computational Study of Translating Classical Chinese into Japanese
Zilong Li, Jie Cao

TL;DR
This paper explores translating classical Chinese into Japanese by modeling annotation-based translation as sequence tagging, leveraging large language models and auxiliary NLP tasks to improve low-resource translation quality.
Contribution
It introduces an LLM-based annotation pipeline and a new dataset, demonstrating how auxiliary NLP tasks enhance sequence tagging in low-resource classical Chinese to Japanese translation.
Findings
Auxiliary NLP tasks improve sequence tagging performance
LLMs achieve high scores on direct translation but benefit from annotation methods
New dataset supports low-resource translation research
Abstract
Ancient people translated classical Chinese into Japanese using a system of annotations placed around characters. We abstract this process as sequence tagging tasks and fit them into modern language technologies. The research on this annotation and translation system faces a low resource problem. We alleviate this problem by introducing an LLM-based annotation pipeline and constructing a new dataset from digitized open-source translation data. We show that in the low-resource setting, introducing auxiliary Chinese NLP tasks enhances the training of sequence tagging tasks. We also evaluate the performance of Large Language Models (LLMs) on this task. While they achieve high scores on direct machine translation, our method could serve as a supplement to LLMs to improve the quality of character's annotation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Biomedical Text Mining and Ontologies
