Beyond Instruction Following: Evaluating Inferential Rule Following of Large Language Models
Wangtao Sun, Chenxiang Zhang, XueYou Zhang, Xuanqing Yu, Ziyang Huang,, Pei Chen, Haotian Xu, Shizhu He, Jun Zhao, Kang Liu

TL;DR
This paper introduces RuleBench, a comprehensive benchmark to evaluate the inferential rule-following abilities of large language models, revealing their current limitations and proposing a tuning method to improve their rule-following capabilities.
Contribution
The paper clarifies the concept of inferential rule-following, creates the RuleBench benchmark, and proposes IRFT to enhance LLMs' ability to follow rules beyond instruction following.
Findings
LLMs show limited inferential rule-following abilities
IRFT improves LLMs' ability to learn and generalize rules
RuleBench provides a diversified evaluation of rule-following skills
Abstract
Although Large Language Models (LLMs) have demonstrated strong ability, they are further supposed to be controlled and guided by in real-world scenarios to be safe, accurate, and intelligent. This demands the possession of capability of LLMs. However, no prior work has made a clear evaluation of the inferential rule-following capability of LLMs. Previous studies that try to evaluate the inferential rule-following capability of LLMs fail to distinguish the inferential rule-following scenarios from the instruction-following scenarios. Therefore, this paper first clarifies the concept of inferential rule-following and proposes a comprehensive benchmark, RuleBench, to evaluate a diversified range of inferential rule-following abilities. Our experimental results on a variety of LLMs show that they are still limited in following rules. Our analysis based on the evaluation results provides…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
