Reasoning or a Semblance of it? A Diagnostic Study of Transitive   Reasoning in LLMs

Houman Mehrafarin; Arash Eshghi; Ioannis Konstas

arXiv:2410.20200·cs.CL·October 29, 2024

Reasoning or a Semblance of it? A Diagnostic Study of Transitive Reasoning in LLMs

Houman Mehrafarin, Arash Eshghi, Ioannis Konstas

PDF

Open Access 1 Video

TL;DR

This study examines whether large language models genuinely understand transitive reasoning or rely on cues, revealing that fine-tuning influences their reasoning capabilities and resilience to confounding factors.

Contribution

It provides a diagnostic analysis of LLMs' transitive reasoning, highlighting the impact of fine-tuning and dataset cues on their reasoning performance.

Findings

01

Flan-T5 is more resilient to confounding cues than LLaMA 2.

02

Models may develop transitive understanding through fine-tuning.

03

Performance varies based on dataset manipulations and model architecture.

Abstract

Evaluating Large Language Models (LLMs) on reasoning benchmarks demonstrates their ability to solve compositional questions. However, little is known of whether these models engage in genuine logical reasoning or simply rely on implicit cues to generate answers. In this paper, we investigate the transitive reasoning capabilities of two distinct LLM architectures, LLaMA 2 and Flan-T5, by manipulating facts within two compositional datasets: QASC and Bamboogle. We controlled for potential cues that might influence the models' performance, including (a) word/phrase overlaps across sections of test input; (b) models' inherent knowledge during pre-training or fine-tuning; and (c) Named Entities. Our findings reveal that while both models leverage (a), Flan-T5 shows more resilience to experiments (b and c), having less variance than LLaMA 2. This suggests that models may develop an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Reasoning or a Semblance of it? A Diagnostic Study of Transitive Reasoning in LLMs· underline

Taxonomy

TopicsArtificial Intelligence in Law

MethodsFlan-T5 · LLaMA