Multi-Level Aware Preference Learning: Enhancing RLHF for Complex Multi-Instruction Tasks

Ruopei Sun; Jianfeng Cai; Jinhua Zhu; Kangwen Zhao; Dongyun Xue; Wengang Zhou; Li Li; Houqiang Li

arXiv:2505.12845·cs.AI·May 20, 2025

Multi-Level Aware Preference Learning: Enhancing RLHF for Complex Multi-Instruction Tasks

Ruopei Sun, Jianfeng Cai, Jinhua Zhu, Kangwen Zhao, Dongyun Xue, Wengang Zhou, Li Li, Houqiang Li

PDF

Open Access

TL;DR

This paper introduces MAPL, a novel framework that leverages multi-level preference signals, including intra- and inter-sample preferences, to improve RLHF for complex multi-instruction tasks, reducing resource needs and bias.

Contribution

The paper proposes a new multi-level preference learning framework that captures latent signals within prompts and among samples, enhancing RLHF's effectiveness for complex instructions.

Findings

01

Improved performance on multiple benchmarks.

02

Enhanced multi-instruction task handling.

03

Effective integration with reward modeling and preference optimization.

Abstract

RLHF has emerged as a predominant approach for aligning artificial intelligence systems with human preferences, demonstrating exceptional and measurable efficacy in instruction following tasks; however, it exhibits insufficient compliance capabilities when confronted with complex multi-instruction tasks. Conventional approaches rely heavily on human annotation or more sophisticated large language models, thereby introducing substantial resource expenditure or potential bias concerns. Meanwhile, alternative synthetic methods that augment standard preference datasets often compromise the model's semantic quality. Our research identifies a critical oversight in existing techniques, which predominantly focus on comparing responses while neglecting valuable latent signals embedded within prompt inputs, and which only focus on preference disparities at the intra-sample level, while neglecting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Multimodal Machine Learning Applications · Machine Learning and Data Classification

MethodsAttentive Walk-Aggregating Graph Neural Network · Focus