Improving LLM Abilities in Idiomatic Translation
Sundesh Donthi, Maximilian Spencer, Om Patel, Joon Doh, Eid Rodan,, Kevin Zhu, Sean O'Brien

TL;DR
This paper enhances large language models' ability to translate idioms accurately by expanding knowledge bases and employing semantic similarity and LLM-based methods, improving cross-cultural translation fidelity.
Contribution
It introduces a novel approach combining semantic similarity and LLM techniques to better preserve idiomatic style in translations across multiple languages.
Findings
Cosine Similarity method outperforms other approaches in human evaluations.
The methods improve translation quality for English-Chinese and Chinese-English.
A new Urdu idiom dataset was developed to support low-resource language translation.
Abstract
For large language models (LLMs) like NLLB and GPT, translating idioms remains a challenge. Our goal is to enhance translation fidelity by improving LLM processing of idiomatic language while preserving the original linguistic style. This has a significant social impact, as it preserves cultural nuances and ensures translated texts retain their intent and emotional resonance, fostering better cross-cultural communication. Previous work has utilized knowledge bases like IdiomKB by providing the LLM with the meaning of an idiom to use in translation. Although this method yielded better results than a direct translation, it is still limited in its ability to preserve idiomatic writing style across languages. In this research, we expand upon the knowledge base to find corresponding idioms in the target language. Our research performs translations using two methods: The first method employs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Translation Studies and Practices
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Cosine Annealing · Layer Normalization · Linear Layer · Attention Dropout · Adam · Dropout · Dense Connections
