ChaosBench-Logic: A Benchmark for Logical and Symbolic Reasoning on Chaotic Dynamical Systems
Noel Thomas

TL;DR
ChaosBench-Logic is a comprehensive benchmark designed to evaluate large language models' logical and symbolic reasoning capabilities on chaotic dynamical systems, revealing strengths and weaknesses in their reasoning skills.
Contribution
The paper introduces ChaosBench-Logic, a novel benchmark with diverse dynamical systems, annotated logic, and evaluation metrics to assess LLM reasoning in complex scientific domains.
Findings
LLMs achieve 91-94% accuracy on individual items
LLMs score 0% on compositional reasoning tasks
Dialogue accuracy varies from 53.1% to 75.5%
Abstract
Large language models (LLMs) excel at natural language tasks but remain brittle in domains requiring precise logical and symbolic reasoning. Chaotic dynamical systems provide an especially demanding test because chaos is deterministic yet often misinterpreted as randomness or complexity. We introduce ChaosBench-Logic, a benchmark that evaluates LLM reasoning across 30 diverse dynamical systems using a unified first-order logic (FOL) ontology. Each system is annotated with truth assignments for 11 semantic predicates, and 621 questions are generated across seven reasoning categories, including multi-hop implications, cross-system analogies, counterfactual reasoning, bias probes, and multi-turn dialogues. We define metrics for logical accuracy, implication consistency, dialogue coherence, and contradiction, and we release an open-source evaluation pipeline. Initial experiments show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Text Readability and Simplification
