Benchmarking Defeasible Reasoning with Large Language Models -- Initial Experiments and Future Directions
Ilias Tachmazidis, Sotiris Batsakis, Grigoris Antoniou

TL;DR
This paper introduces a new benchmark for evaluating large language models on defeasible reasoning tasks, compares LLM performance with traditional defeasible logic, and discusses future research directions.
Contribution
It presents the first benchmark translating defeasible logic into LLM-compatible formats and provides initial experimental results with ChatGPT.
Findings
ChatGPT's reasoning performance varies across patterns.
The benchmark reveals strengths and limitations of LLMs in defeasible reasoning.
Initial experiments guide future improvements in LLM reasoning capabilities.
Abstract
Large Language Models (LLMs) have gained prominence in the AI landscape due to their exceptional performance. Thus, it is essential to gain a better understanding of their capabilities and limitations, among others in terms of nonmonotonic reasoning. This paper proposes a benchmark that corresponds to various defeasible rule-based reasoning patterns. We modified an existing benchmark for defeasible logic reasoners by translating defeasible rules into text suitable for LLMs. We conducted preliminary experiments on nonmonotonic rule-based reasoning using ChatGPT and compared it with reasoning patterns defined by defeasible logic.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Topic Modeling
