Dynamic Meta-Metrics: Source-Sentence Conditioned Weighting for MT Evaluation

Luke Zhang; Justin Vasselli; Aditya Khan; York Hay Ng; En-Shiun Annie Lee

arXiv:2605.09098·cs.CL·May 12, 2026

Dynamic Meta-Metrics: Source-Sentence Conditioned Weighting for MT Evaluation

Luke Zhang, Justin Vasselli, Aditya Khan, York Hay Ng, En-Shiun Annie Lee

PDF

TL;DR

This paper introduces Dynamic Meta-Metrics, a flexible framework for machine translation evaluation that adapts metric combinations based on source sentence properties, improving agreement with human judgments.

Contribution

The paper presents a novel source-conditioned metric combination approach that outperforms static ensembles and linear models in MT evaluation.

Findings

01

MLP-based combinations outperform linear and Gaussian process ensembles.

02

Soft conditioning improves over linear models.

03

DMM achieves higher agreement with human judgments across language pairs.

Abstract

We propose Dynamic Meta-Metrics (DMM), a framework for machine translation evaluation that learns source-sentence conditioned combinations of existing metrics. Rather than relying on a single static ensemble or language-specific weighting, DMM adapts the metric combination based on properties of the source segment. We study hard conditioning, which fits an interpretable combiner per cluster, and an exploratory soft-conditioned extension whose weights vary continuously with source-cluster responsibilities. We evaluate DMM on the WMT Metrics Shared Task data across multiple language pairs using pairwise agreement measures at the system and segment levels. Across settings, MLP-based combinations outperform linear and Gaussian process-based ensembles, and introducing soft conditioning yields gains over linear models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.