Beyond Instruction Following: Evaluating Inferential Rule Following of   Large Language Models

Wangtao Sun; Chenxiang Zhang; XueYou Zhang; Xuanqing Yu; Ziyang Huang,; Pei Chen; Haotian Xu; Shizhu He; Jun Zhao; Kang Liu

arXiv:2407.08440·cs.CL·October 18, 2024

Beyond Instruction Following: Evaluating Inferential Rule Following of Large Language Models

Wangtao Sun, Chenxiang Zhang, XueYou Zhang, Xuanqing Yu, Ziyang Huang,, Pei Chen, Haotian Xu, Shizhu He, Jun Zhao, Kang Liu

PDF

Open Access

TL;DR

This paper introduces RuleBench, a comprehensive benchmark to evaluate the inferential rule-following abilities of large language models, revealing their current limitations and proposing a tuning method to improve their rule-following capabilities.

Contribution

The paper clarifies the concept of inferential rule-following, creates the RuleBench benchmark, and proposes IRFT to enhance LLMs' ability to follow rules beyond instruction following.

Findings

01

LLMs show limited inferential rule-following abilities

02

IRFT improves LLMs' ability to learn and generalize rules

03

RuleBench provides a diversified evaluation of rule-following skills

Abstract

Although Large Language Models (LLMs) have demonstrated strong ability, they are further supposed to be controlled and guided by in real-world scenarios to be safe, accurate, and intelligent. This demands the possession of capability of LLMs. However, no prior work has made a clear evaluation of the inferential rule-following capability of LLMs. Previous studies that try to evaluate the inferential rule-following capability of LLMs fail to distinguish the inferential rule-following scenarios from the instruction-following scenarios. Therefore, this paper first clarifies the concept of inferential rule-following and proposes a comprehensive benchmark, RuleBench, to evaluate a diversified range of inferential rule-following abilities. Our experimental results on a variety of LLMs show that they are still limited in following rules. Our analysis based on the evaluation results provides…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques