Multi-Level Aware Preference Learning: Enhancing RLHF for Complex Multi-Instruction Tasks
Ruopei Sun, Jianfeng Cai, Jinhua Zhu, Kangwen Zhao, Dongyun Xue, Wengang Zhou, Li Li, Houqiang Li

TL;DR
This paper introduces MAPL, a novel framework that leverages multi-level preference signals, including intra- and inter-sample preferences, to improve RLHF for complex multi-instruction tasks, reducing resource needs and bias.
Contribution
The paper proposes a new multi-level preference learning framework that captures latent signals within prompts and among samples, enhancing RLHF's effectiveness for complex instructions.
Findings
Improved performance on multiple benchmarks.
Enhanced multi-instruction task handling.
Effective integration with reward modeling and preference optimization.
Abstract
RLHF has emerged as a predominant approach for aligning artificial intelligence systems with human preferences, demonstrating exceptional and measurable efficacy in instruction following tasks; however, it exhibits insufficient compliance capabilities when confronted with complex multi-instruction tasks. Conventional approaches rely heavily on human annotation or more sophisticated large language models, thereby introducing substantial resource expenditure or potential bias concerns. Meanwhile, alternative synthetic methods that augment standard preference datasets often compromise the model's semantic quality. Our research identifies a critical oversight in existing techniques, which predominantly focus on comparing responses while neglecting valuable latent signals embedded within prompt inputs, and which only focus on preference disparities at the intra-sample level, while neglecting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Multimodal Machine Learning Applications · Machine Learning and Data Classification
MethodsAttentive Walk-Aggregating Graph Neural Network · Focus
