VEPO: Variable Entropy Policy Optimization for Low-Resource Language Foundation Models

Chonghan Liu; Yimin Du; Qi An; Xin He; Cunqi Zhai; Fei Tan; Weijia Lin; Xiaochun Gong; Yongchao Deng; Shousheng Jia; Xiangzheng Zhang

arXiv:2603.19152·cs.CL·March 20, 2026

VEPO: Variable Entropy Policy Optimization for Low-Resource Language Foundation Models

Chonghan Liu, Yimin Du, Qi An, Xin He, Cunqi Zhai, Fei Tan, Weijia Lin, Xiaochun Gong, Yongchao Deng, Shousheng Jia, Xiangzheng Zhang

PDF

Open Access

TL;DR

VEPO introduces a reinforcement learning framework with variable entropy control to improve low-resource language models by enhancing tokenization and translation quality, addressing data imbalance and subword segmentation issues.

Contribution

The paper presents VEPO, a novel reinforcement learning method with variable entropy that enforces linguistic constraints and balances fidelity and naturalness in low-resource language modeling.

Findings

01

Significant improvements in tokenization efficiency.

02

Enhanced translation quality for underrepresented languages.

03

Bridging performance gaps in low-resource language tasks.

Abstract

Large language models frequently exhibit suboptimal performance on low resource languages, primarily due to inefficient subword segmentation and systemic training data imbalances. In this paper, we propose Variable Entropy Policy Optimization (VEPO), which leverages Reinforcement Learning with Verifiable Rewards to incorporate deterministic structural constraints into the policy alignment process. This framework ensures prescribed sequence length, robust format consistency, and rigorous linguistic well formedness, all enforced during training. Central to our approach is a variable entropy mechanism that enables the model to dynamically calibrate the equilibrium between literal fidelity and semantic naturalness by modulating the exploration exploitation manifold. By integrating entropy tempered advantage estimation with asymmetric clipping, VEPO sustains robust exploration while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification