BaziQA-Benchmark: Evaluating Symbolic and Temporally Compositional Reasoning in Large Language Models

Jiangxi Chen; Qian Liu

arXiv:2602.12889·cs.CL·February 16, 2026

BaziQA-Benchmark: Evaluating Symbolic and Temporally Compositional Reasoning in Large Language Models

Jiangxi Chen, Qian Liu

PDF

Open Access

TL;DR

BaziQA-Benchmark provides a standardized, objective way to evaluate large language models' symbolic and temporal reasoning abilities using curated problems from a global competition, revealing strengths and systematic weaknesses.

Contribution

This work introduces BaziQA-Benchmark, a novel, controlled benchmark for assessing symbolic and temporal reasoning in large language models, with a structured protocol for probing reasoning behavior.

Findings

01

Models outperform chance but are far from saturation.

02

Performance varies with temporal difficulty and reasoning protocols.

03

Models show systematic failures in temporal localization and multi-condition judgments.

Abstract

We present BaziQA-Benchmark, a standardized benchmark for evaluating symbolic and temporally compositional reasoning in large language models. The benchmark is derived from 200 professionally curated, multiple-choice problems from the Global Fortune-teller Competition (2021--2025), where each instance requires structured inference over a fixed symbolic chart and interacting temporal conditions. Unlike anecdotal or prompt-driven evaluations, BaziQA-Benchmark enables objective scoring and controlled comparison across years, domains, and model families. We evaluate contemporary language models under a multi-turn setting and analyze performance variation across temporal difficulty, reasoning domains, and inference protocols.To further probe reasoning behavior, we introduce a lightweight Structured Reasoning Protocol that constrains inference order without adding domain knowledge. Results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)