M-MAD: Multidimensional Multi-Agent Debate for Advanced Machine   Translation Evaluation

Zhaopeng Feng; Jiayuan Su; Jiamei Zheng; Jiahan Ren; Yan Zhang; Jian; Wu; Hongwei Wang; Zuozhu Liu

arXiv:2412.20127·cs.CL·February 21, 2025

M-MAD: Multidimensional Multi-Agent Debate for Advanced Machine Translation Evaluation

Zhaopeng Feng, Jiayuan Su, Jiamei Zheng, Jiahan Ren, Yan Zhang, Jian, Wu, Hongwei Wang, Zuozhu Liu

PDF

Open Access 1 Repo 1 Video

TL;DR

M-MAD introduces a multi-agent debate framework utilizing LLMs for detailed, reliable machine translation evaluation, outperforming existing LLM-based methods and rivaling state-of-the-art automatic metrics.

Contribution

It presents a novel multi-agent debate approach that decouples evaluation criteria and synthesizes results, advancing LLM-based MT evaluation.

Findings

01

Outperforms existing LLM-as-a-judge methods

02

Competes with state-of-the-art automatic metrics

03

Demonstrates robustness with suboptimal models

Abstract

Recent advancements in large language models (LLMs) have given rise to the LLM-as-a-judge paradigm, showcasing their potential to deliver human-like judgments. However, in the field of machine translation (MT) evaluation, current LLM-as-a-judge methods fall short of learned automatic metrics. In this paper, we propose Multidimensional Multi-Agent Debate (M-MAD), a systematic LLM-based multi-agent framework for advanced LLM-as-a-judge MT evaluation. Our findings demonstrate that M-MAD achieves significant advancements by (1) decoupling heuristic MQM criteria into distinct evaluation dimensions for fine-grained assessments; (2) employing multi-agent debates to harness the collaborative reasoning capabilities of LLMs; (3) synthesizing dimension-specific results into a final evaluation judgment to ensure robust and reliable outcomes. Comprehensive experiments show that M-MAD not only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

su-jiayuan/m-mad
noneOfficial

Videos

M-MAD: Multidimensional Multi-Agent Debate for Advanced Machine Translation Evaluation· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies