QUATRO: Query-Adaptive Trust Region Policy Optimization for LLM Fine-tuning

Doyeon Lee; Eunyi Lyou; Hyunsoo Cho; Sookyung Kim; Joonseok Lee; Jaemoo Choi

arXiv:2602.04620·cs.LG·February 9, 2026

QUATRO: Query-Adaptive Trust Region Policy Optimization for LLM Fine-tuning

Doyeon Lee, Eunyi Lyou, Hyunsoo Cho, Sookyung Kim, Joonseok Lee, Jaemoo Choi

PDF

Open Access

TL;DR

QUATRO introduces a principled, trust-region-based approach for LLM fine-tuning that improves stability and control over policy updates, outperforming heuristic methods in diverse reasoning tasks.

Contribution

It proposes a novel optimization method that directly enforces trust-region constraints, enhancing stability and interpretability in RL-based LLM fine-tuning.

Findings

01

Stable training under high policy staleness

02

Maintains controlled entropy during training

03

Outperforms heuristic trust-region methods on reasoning benchmarks

Abstract

GRPO-style reinforcement learning (RL)-based LLM fine-tuning algorithms have recently gained popularity. Relying on heuristic trust-region approximations, however, they can lead to brittle optimization behavior, as global importance-ratio clipping and group-wise normalization fail to regulate samples whose importance ratios fall outside the clipping range. We propose Query-Adaptive Trust-Region policy Optimization (QUATRO), which directly enforces trust-region constraints through a principled optimization. This yields a clear and interpretable objective that enables explicit control over policy updates and stable, entropy-controlled optimization, with a stabilizer terms arising intrinsically from the exact trust-region formulation. Empirically verified on diverse mathematical reasoning benchmarks, QUATRO shows stable training under increased policy staleness and aggressive learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks