Ancient Korean Archive Translation: Comparison Analysis on Statistical   phrase alignment, LLM in-context learning, and inter-methodological approach

Sojung Lucia Kim; Taehong Jang; Joonmo Ahn

arXiv:2407.11368·cs.CL·July 17, 2024

Ancient Korean Archive Translation: Comparison Analysis on Statistical phrase alignment, LLM in-context learning, and inter-methodological approach

Sojung Lucia Kim, Taehong Jang, Joonmo Ahn

PDF

Open Access

TL;DR

This paper compares traditional statistical translation, in-context LLM learning, and a new inter-methodological approach for translating ancient Korean texts, demonstrating the proposed method's superior BLEU score performance.

Contribution

It introduces a novel inter-methodological approach using sentence piece tokens, outperforming existing models in ancient text translation.

Findings

01

Proposed method achieved a BLEU score of 36.71.

02

The approach surpasses SOLAR-10.7B and Seq2Seq models.

03

Analysis confirms the effectiveness of the inter-methodological approach.

Abstract

This study aims to compare three methods for translating ancient texts with sparse corpora: (1) the traditional statistical translation method of phrase alignment, (2) in-context LLM learning, and (3) proposed inter methodological approach - statistical machine translation method using sentence piece tokens derived from unified set of source-target corpus. The performance of the proposed approach in this study is 36.71 in BLEU score, surpassing the scores of SOLAR-10.7B context learning and the best existing Seq2Seq model. Further analysis and discussion are presented.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsSparse Evolutionary Training · Sigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence