Incentivizing Parametric Knowledge via Reinforcement Learning with Verifiable Rewards for Cross-Cultural Entity Translation

Jiang Zhou; Xiaohu Zhao; Xinwei Wu; Tianyu Dong; Hao Wang; Yangyang Liu; Heng Liu; Linlong Xu; Longyue Wang; Weihua Luo; Deyi Xiong

arXiv:2604.16881·cs.CL·April 21, 2026

Incentivizing Parametric Knowledge via Reinforcement Learning with Verifiable Rewards for Cross-Cultural Entity Translation

Jiang Zhou, Xiaohu Zhao, Xinwei Wu, Tianyu Dong, Hao Wang, Yangyang Liu, Heng Liu, Linlong Xu, Longyue Wang, Weihua Luo, Deyi Xiong

PDF

TL;DR

This paper introduces EA-RLVR, a reinforcement learning framework that enhances cross-cultural entity translation in large language models by using verifiable, entity-level rewards to improve accuracy and generalization without external knowledge bases.

Contribution

EA-RLVR is a novel training method that stabilizes optimization and encourages models to learn robust reasoning for culturally appropriate translations using minimal data.

Findings

01

Training on 7k samples improves Qwen3-14B's entity translation accuracy from 23.66% to 31.87%.

02

EA-RLVR achieves +1.35 XCOMET on WMT24++ for general translation.

03

The approach enhances out-of-domain generalization and sampling efficiency.

Abstract

Cross-cultural entity translation remains challenging for large language models (LLMs) as literal or phonetic renderings are usually yielded instead of culturally appropriate translations in context. However, relevant knowledge may already be encoded in model parameters during large-scale pre-training. To incentivize the effective use of parametric knowledge, we propose EA-RLVR (Entity-Anchored Reinforcement Learning with Verifiable Rewards), a training framework that optimizes cross-cultural entity translation without relying on external knowledge bases. EA-RLVR anchors supervision on a verifiable, entity-level reward signal and incorporates lightweight structural gates to stabilize optimization. This design steers the model toward learning a robust reasoning process rather than merely imitating reference translations. We evaluate EA-RLVR on XC-Translate and observe consistent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.