AdvancedIF: Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following

Yun He; Wenzhe Li; Hejia Zhang; Songlin Li; Karishma Mandyam; Sopan Khosla; Yuanhao Xiong; Nanshu Wang; Xiaoliang Peng; Beibin Li; Shengjie Bi; Shishir G. Patil; Qi Qi; Shengyu Feng; Julian Katz-Samuels; Richard Yuanzhe Pang; Sujan Gonugondla; Hunter Lang; Yue Yu; Yundi Qian; Maryam Fazel-Zarandi; Licheng Yu; Amine Benhalloum; Hany Awadalla; Manaal Faruqui

arXiv:2511.10507·cs.CL·November 27, 2025

AdvancedIF: Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following

Yun He, Wenzhe Li, Hejia Zhang, Songlin Li, Karishma Mandyam, Sopan Khosla, Yuanhao Xiong, Nanshu Wang, Xiaoliang Peng, Beibin Li, Shengjie Bi, Shishir G. Patil, Qi Qi, Shengyu Feng, Julian Katz-Samuels, Richard Yuanzhe Pang, Sujan Gonugondla, Hunter Lang, Yue Yu, Yundi Qian

PDF

Open Access 2 Datasets

TL;DR

This paper introduces AdvancedIF, a new benchmark with expert rubrics for evaluating complex instruction following in LLMs, and RIFL, a reinforcement learning pipeline that significantly improves LLM performance on this benchmark.

Contribution

The paper presents a comprehensive rubric-based benchmark and a novel reinforcement learning method, RIFL, to enhance LLM instruction-following capabilities for complex tasks.

Findings

01

RIFL improves LLM performance by 6.7% on AdvancedIF

02

Rubrics effectively guide training and evaluation of instruction following

03

RIFL achieves strong results on public benchmarks

Abstract

Recent progress in large language models (LLMs) has led to impressive performance on a range of tasks, yet advanced instruction following (IF)-especially for complex, multi-turn, and system-prompted instructions-remains a significant challenge. Rigorous evaluation and effective training for such capabilities are hindered by the lack of high-quality, human-annotated benchmarks and reliable, interpretable reward signals. In this work, we introduce AdvancedIF (we will release this benchmark soon), a comprehensive benchmark featuring over 1,600 prompts and expert-curated rubrics that assess LLMs ability to follow complex, multi-turn, and system-level instructions. We further propose RIFL (Rubric-based Instruction-Following Learning), a novel post-training pipeline that leverages rubric generation, a finetuned rubric verifier, and reward shaping to enable effective reinforcement learning for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning