Translating Hanja Historical Documents to Contemporary Korean and English
Juhee Son, Jiho Jin, Haneul Yoo, JinYeong Bak, Kyunghyun Cho, Alice Oh

TL;DR
This paper introduces H2KE, a neural machine translation model that effectively translates archaic Hanja historical documents into contemporary Korean and English, outperforming existing methods and being preferred by human evaluators.
Contribution
H2KE leverages multilingual neural translation to improve translation quality of Hanja documents into modern Korean and English, addressing slow progress in expert translations.
Findings
H2KE significantly outperforms baseline models in BLEU scores.
Human evaluation favors H2KE translations over expert translations.
The model effectively handles limited recent translation data.
Abstract
The Annals of Joseon Dynasty (AJD) contain the daily records of the Kings of Joseon, the 500-year kingdom preceding the modern nation of Korea. The Annals were originally written in an archaic Korean writing system, `Hanja', and were translated into Korean from 1968 to 1993. The resulting translation was however too literal and contained many archaic Korean words; thus, a new expert translation effort began in 2012. Since then, the records of only one king have been completed in a decade. In parallel, expert translators are working on English translation, also at a slow pace and produced only one king's records in English so far. Thus, we propose H2KE, a neural machine translation model, that translates historical documents in Hanja to more easily understandable Korean and to English. Built on top of multilingual neural machine translation, H2KE learns to translate a historical document…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Residual Connection · Label Smoothing · Adam · Dense Connections · Softmax · Byte Pair Encoding · Position-Wise Feed-Forward Layer
