Multi-Task GRPO: Reliable LLM Reasoning Across Tasks

Shyam Sundhar Ramesh; Xiaotong Ji; Matthieu Zimmer; Sangwoong Yoon; Zhiyong Wang; Haitham Bou Ammar; Aurelien Lucchi; Ilija Bogunovic

arXiv:2602.05547·cs.CL·February 6, 2026

Multi-Task GRPO: Reliable LLM Reasoning Across Tasks

Shyam Sundhar Ramesh, Xiaotong Ji, Matthieu Zimmer, Sangwoong Yoon, Zhiyong Wang, Haitham Bou Ammar, Aurelien Lucchi, Ilija Bogunovic

PDF

Open Access

TL;DR

This paper introduces Multi-Task GRPO, a novel algorithm that dynamically balances task performance in large language models, significantly improving worst-task accuracy and training efficiency across multiple reasoning tasks.

Contribution

The paper proposes a new Multi-Task GRPO algorithm that adaptively weights tasks and uses a ratio-preserving sampler to ensure balanced optimization across diverse tasks.

Findings

01

Outperforms baselines in worst-task accuracy by 16-28% and 6%.

02

Achieves 50% fewer training steps to reach 50% worst-task accuracy.

03

Maintains competitive average accuracy while improving reliability.

Abstract

RL-based post-training with GRPO is widely used to improve large language models on individual reasoning tasks. However, real-world deployment requires reliable performance across diverse tasks. A straightforward multi-task adaptation of GRPO often leads to imbalanced outcomes, with some tasks dominating optimization while others stagnate. Moreover, tasks can vary widely in how frequently prompts yield zero advantages (and thus zero gradients), which further distorts their effective contribution to the optimization signal. To address these issues, we propose a novel Multi-Task GRPO (MT-GRPO) algorithm that (i) dynamically adapts task weights to explicitly optimize worst-task performance and promote balanced progress across tasks, and (ii) introduces a ratio-preserving sampler to ensure task-wise policy gradients reflect the adapted weights. Experiments on both 3-task and 9-task settings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning