Large Language Models "Ad Referendum": How Good Are They at Machine Translation in the Legal Domain?
Vicent Briva-Iglesias, Joao Lucas Cavalheiro Camargo, Gokhan Dogru

TL;DR
This paper compares large language models and traditional neural machine translation systems in legal domain translation, revealing that LLMs like GPT-4 can produce human-like, contextually appropriate translations despite lower automatic scores.
Contribution
It provides a comprehensive evaluation of LLMs versus NMT in legal translation, highlighting LLMs' potential and the need for improved evaluation metrics.
Findings
LLMs perform comparably to NMT in human evaluations.
GPT-4 shows strong contextual and fluency qualities.
Traditional AEMs favor NMT over LLMs.
Abstract
This study evaluates the machine translation (MT) quality of two state-of-the-art large language models (LLMs) against a tradition-al neural machine translation (NMT) system across four language pairs in the legal domain. It combines automatic evaluation met-rics (AEMs) and human evaluation (HE) by professional transla-tors to assess translation ranking, fluency and adequacy. The re-sults indicate that while Google Translate generally outperforms LLMs in AEMs, human evaluators rate LLMs, especially GPT-4, comparably or slightly better in terms of producing contextually adequate and fluent translations. This discrepancy suggests LLMs' potential in handling specialized legal terminology and context, highlighting the importance of human evaluation methods in assessing MT quality. The study underscores the evolving capabil-ities of LLMs in specialized domains and calls for reevaluation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law
MethodsPosition-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Absolute Position Encodings · Softmax · Byte Pair Encoding · Linear Layer · Attention Is All You Need · Dropout · Multi-Head Attention
