Benchmarking and Understanding Compositional Relational Reasoning of   LLMs

Ruikang Ni; Da Xiao; Qingye Meng; Xiangyu Li; Shihui Zheng; Hongliang; Liang

arXiv:2412.12841·cs.CL·December 18, 2024

Benchmarking and Understanding Compositional Relational Reasoning of LLMs

Ruikang Ni, Da Xiao, Qingye Meng, Xiangyu Li, Shihui Zheng, Hongliang, Liang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new benchmark called GAR to evaluate the compositional relational reasoning capabilities of large language models, revealing their fundamental deficiencies and identifying core attention heads involved in reasoning.

Contribution

It proposes a synthetic benchmark for CRR, systematically analyzes LLMs' reasoning circuits, and identifies key attention heads crucial for CRR tasks.

Findings

01

Existing LLMs struggle with CRR tasks in GAR benchmark.

02

Core attention heads are vital for reasoning in LLMs.

03

Two classes of heads encode true and false concepts in GAR.

Abstract

Compositional relational reasoning (CRR) is a hallmark of human intelligence, but we lack a clear understanding of whether and how existing transformer large language models (LLMs) can solve CRR tasks. To enable systematic exploration of the CRR capability of LLMs, we first propose a new synthetic benchmark called Generalized Associative Recall (GAR) by integrating and generalizing the essence of several tasks in mechanistic interpretability (MI) study in a unified framework. Evaluation shows that GAR is challenging enough for existing LLMs, revealing their fundamental deficiency in CRR. Meanwhile, it is easy enough for systematic MI study. Then, to understand how LLMs solve GAR tasks, we use attribution patching to discover the core circuits reused by Vicuna-33B across different tasks and a set of vital attention heads. Intervention experiments show that the correct functioning of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

caiyun-ai/gar
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies

MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training · Activation Patching