JP-TL-Bench: Anchored Pairwise LLM Evaluation for Bidirectional Japanese-English Translation

Leonard Lin; Adam Lensenmayer (Shisa.AI)

arXiv:2601.00223·cs.CL·January 5, 2026

JP-TL-Bench: Anchored Pairwise LLM Evaluation for Bidirectional Japanese-English Translation

Leonard Lin, Adam Lensenmayer (Shisa.AI)

PDF

Open Access

TL;DR

JP-TL-Bench is an open, lightweight benchmark for Japanese-English translation that uses pairwise LLM comparisons against a fixed anchor set to reliably evaluate translation quality.

Contribution

It introduces a novel protocol for LLM-based pairwise evaluation of translation systems that ensures stability and affordability.

Findings

01

Stable evaluation scores due to fixed anchor set

02

Effective aggregation with Bradley-Terry model

03

Normalized 0-10 LT score for comparison

Abstract

We introduce JP-TL-Bench, a lightweight, open benchmark designed to guide the iterative development of Japanese-English translation systems. In this context, the challenge is often "which of these two good translations is better?" rather than "is this translation acceptable?" This distinction matters for Japanese-English, where subtle choices in politeness, implicature, ellipsis, and register strongly affect perceived naturalness. JP-TL-Bench uses a protocol built to make LLM judging both reliable and affordable: it evaluates a candidate model via reference-free, pairwise LLM comparisons against a fixed, versioned anchor set. Pairwise results are aggregated with a Bradley-Terry model and reported as win rates plus a normalized 0-10 "LT" score derived from a logistic transform of fitted log-strengths. Because each candidate is scored against the same frozen anchor set, scores are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Second Language Acquisition and Learning