Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models
Nisarg Patel, Mohith Kulkarni, Mihir Parmar, Aashna Budhiraja, Mutsumi, Nakamura, Neeraj Varshney, Chitta Baral

TL;DR
Multi-LogiEval introduces a comprehensive dataset for evaluating large language models' multi-step logical reasoning across various logic types and inference rules, revealing significant performance drops with increased reasoning depth.
Contribution
The paper presents Multi-LogiEval, a new dataset for multi-step logical reasoning evaluation, including non-monotonic logic, and provides extensive analysis of LLMs' reasoning capabilities.
Findings
LLMs' accuracy drops from ~68% at depth-1 to ~43% at depth-5.
Performance varies significantly across different logic types and inference rules.
Zero-shot chain-of-thought prompting reveals limitations in current LLM reasoning abilities.
Abstract
As Large Language Models (LLMs) continue to exhibit remarkable performance in natural language understanding tasks, there is a crucial need to measure their ability for human-like multi-step logical reasoning. Existing logical reasoning evaluation benchmarks often focus primarily on simplistic single-step or multi-step reasoning with a limited set of inference rules. Furthermore, the lack of datasets for evaluating non-monotonic reasoning represents a crucial gap since it aligns more closely with human-like reasoning. To address these limitations, we propose Multi-LogiEval, a comprehensive evaluation dataset encompassing multi-step logical reasoning with various inference rules and depths. Multi-LogiEval covers three logic types--propositional, first-order, and non-monotonic--consisting of more than 30 inference rules and more than 60 of their combinations with various depths.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
MethodsAttention Is All You Need · Sparse Evolutionary Training · Softmax · Layer Normalization · Absolute Position Encodings · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam
