Generics and Default Reasoning in Large Language Models

James Ravi Kirkpatrick; Rachel Katharine Sterken

arXiv:2508.13718·cs.CL·August 20, 2025

Generics and Default Reasoning in Large Language Models

James Ravi Kirkpatrick, Rachel Katharine Sterken

PDF

TL;DR

This study assesses 28 large language models on their ability to perform defeasible reasoning with generics, revealing strengths, weaknesses, and the impact of different prompting techniques on their reasoning capabilities.

Contribution

It provides a comprehensive evaluation of LLMs' performance on generic reasoning tasks and highlights the effects of prompting styles, especially chain-of-thought prompting, on their reasoning accuracy.

Findings

01

Performance varies widely across models and prompts.

02

Few-shot prompting modestly improves some models' performance.

03

Chain-of-thought prompting often degrades performance significantly.

Abstract

This paper evaluates the capabilities of 28 large language models (LLMs) to reason with 20 defeasible reasoning patterns involving generic generalizations (e.g., 'Birds fly', 'Ravens are black') central to non-monotonic logic. Generics are of special interest to linguists, philosophers, logicians, and cognitive scientists because of their complex exception-permitting behaviour and their centrality to default reasoning, cognition, and concept acquisition. We find that while several frontier models handle many default reasoning problems well, performance varies widely across models and prompting styles. Few-shot prompting modestly improves performance for some models, but chain-of-thought (CoT) prompting often leads to serious performance degradation (mean accuracy drop -11.14%, SD 15.74% in models performing above 75% accuracy in zero-shot condition, temperature 0). Most models either…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.