BaziQA-Benchmark: Evaluating Symbolic and Temporally Compositional Reasoning in Large Language Models
Jiangxi Chen, Qian Liu

TL;DR
BaziQA-Benchmark provides a standardized, objective way to evaluate large language models' symbolic and temporal reasoning abilities using curated problems from a global competition, revealing strengths and systematic weaknesses.
Contribution
This work introduces BaziQA-Benchmark, a novel, controlled benchmark for assessing symbolic and temporal reasoning in large language models, with a structured protocol for probing reasoning behavior.
Findings
Models outperform chance but are far from saturation.
Performance varies with temporal difficulty and reasoning protocols.
Models show systematic failures in temporal localization and multi-condition judgments.
Abstract
We present BaziQA-Benchmark, a standardized benchmark for evaluating symbolic and temporally compositional reasoning in large language models. The benchmark is derived from 200 professionally curated, multiple-choice problems from the Global Fortune-teller Competition (2021--2025), where each instance requires structured inference over a fixed symbolic chart and interacting temporal conditions. Unlike anecdotal or prompt-driven evaluations, BaziQA-Benchmark enables objective scoring and controlled comparison across years, domains, and model families. We evaluate contemporary language models under a multi-turn setting and analyze performance variation across temporal difficulty, reasoning domains, and inference protocols.To further probe reasoning behavior, we introduce a lightweight Structured Reasoning Protocol that constrains inference order without adding domain knowledge. Results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)
