ChaosBench-Logic: A Benchmark for Logical and Symbolic Reasoning on Chaotic Dynamical Systems

Noel Thomas

arXiv:2601.01982·cs.AI·February 13, 2026

ChaosBench-Logic: A Benchmark for Logical and Symbolic Reasoning on Chaotic Dynamical Systems

Noel Thomas

PDF

Open Access 1 Video

TL;DR

ChaosBench-Logic is a comprehensive benchmark designed to evaluate large language models' logical and symbolic reasoning capabilities on chaotic dynamical systems, revealing strengths and weaknesses in their reasoning skills.

Contribution

The paper introduces ChaosBench-Logic, a novel benchmark with diverse dynamical systems, annotated logic, and evaluation metrics to assess LLM reasoning in complex scientific domains.

Findings

01

LLMs achieve 91-94% accuracy on individual items

02

LLMs score 0% on compositional reasoning tasks

03

Dialogue accuracy varies from 53.1% to 75.5%

Abstract

Large language models (LLMs) excel at natural language tasks but remain brittle in domains requiring precise logical and symbolic reasoning. Chaotic dynamical systems provide an especially demanding test because chaos is deterministic yet often misinterpreted as randomness or complexity. We introduce ChaosBench-Logic, a benchmark that evaluates LLM reasoning across 30 diverse dynamical systems using a unified first-order logic (FOL) ontology. Each system is annotated with truth assignments for 11 semantic predicates, and 621 questions are generated across seven reasoning categories, including multi-hop implications, cross-system analogies, counterfactual reasoning, bias probes, and multi-turn dialogues. We define metrics for logical accuracy, implication consistency, dialogue coherence, and contradiction, and we release an open-source evaluation pipeline. Initial experiments show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ChaosBench-Logic: A Benchmark for Logical and Symbolic Reasoning on Chaotic Dynamical Systems· underline

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Text Readability and Simplification