Imbalanced Gradients in RL Post-Training of Multi-Task LLMs

Runzhe Wu; Ankur Samanta; Ayush Jain; Scott Fujimoto; Jeongyeol Kwon; Ben Kretzu; Youliang Yu; Kaveh Hassani; Boris Vidolov; Yonathan Efroni

arXiv:2510.19178·cs.LG·October 28, 2025

Imbalanced Gradients in RL Post-Training of Multi-Task LLMs

Runzhe Wu, Ankur Samanta, Ayush Jain, Scott Fujimoto, Jeongyeol Kwon, Ben Kretzu, Youliang Yu, Kaveh Hassani, Boris Vidolov, Yonathan Efroni

PDF

Open Access 1 Video

TL;DR

This paper reveals that in RL post-training of multi-task LLMs, certain tasks produce disproportionately large gradients that do not correlate with actual learning gains, highlighting the need for gradient-level correction methods.

Contribution

It demonstrates the existence of task gradient imbalance in RL post-training of LLMs and shows this imbalance does not reflect true learning progress, challenging current dataset mixing practices.

Findings

01

Large gradients do not always lead to larger learning gains.

02

Gradient imbalances are not explained by training rewards or advantages.

03

Gradient imbalances stem from inherent task differences.

Abstract

Multi-task post-training of large language models (LLMs) is typically performed by mixing datasets from different tasks and optimizing them jointly. This approach implicitly assumes that all tasks contribute gradients of similar magnitudes; when this assumption fails, optimization becomes biased toward large-gradient tasks. In this paper, however, we show that this assumption fails in RL post-training: certain tasks produce significantly larger gradients, thus biasing updates toward those tasks. Such gradient imbalance would be justified only if larger gradients implied larger learning gains on the tasks (i.e., larger performance improvements) -- but we find this is not true. Large-gradient tasks can achieve similar or even much lower learning gains than small-gradient ones. Further analyses reveal that these gradient imbalances cannot be explained by typical training statistics such as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Imbalanced Gradients in RL Post-Training of Multi-Task LLMs· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education