BIS Reasoning 1.0: The First Large-Scale Japanese Benchmark for Belief-Inconsistent Syllogistic Reasoning
Ha-Thanh Nguyen, Hideyuki Tachibana, Chaoran Liu, Qianying Liu, Su Myat Noe, Koichi Takeda, Sadao Kurohashi

TL;DR
This paper introduces BIS Reasoning 1.0, a large-scale Japanese benchmark dataset designed to evaluate belief-inconsistent syllogistic reasoning in large language models, revealing their strengths and limitations in logical reasoning versus belief bias.
Contribution
It provides the first large-scale Japanese dataset for belief-inconsistent reasoning and benchmarks multiple LLMs, highlighting the importance of explicit reasoning optimization for robustness.
Findings
Reasoning models achieve near-perfect accuracy (~99%) on the benchmark.
GPT-4o attains around 80% accuracy, while earlier models perform below 60%.
Performance depends on prompt design and reasoning effort, especially when logic conflicts with beliefs.
Abstract
We present BIS Reasoning 1.0, the first large-scale Japanese dataset of syllogistic reasoning problems explicitly designed to evaluate belief-inconsistent reasoning in large language models (LLMs). Unlike prior resources such as NeuBAROCO and JFLD, which emphasize general or belief-aligned logic, BIS Reasoning 1.0 systematically introduces logically valid yet belief-inconsistent syllogisms to expose belief bias, the tendency to accept believable conclusions irrespective of validity. We benchmark a representative suite of cutting-edge models, including OpenAI GPT-5 variants, GPT-4o, Qwen, and prominent Japanese LLMs, under a uniform, zero-shot protocol. Reasoning-centric models achieve near-perfect accuracy on BIS Reasoning 1.0 (e.g., Qwen3-32B 99% and GPT-5-mini up to 99.7%), while GPT-4o attains around 80%. Earlier Japanese-specialized models underperform, often well…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education
