Don't Judge a Book by its Cover: Testing LLMs' Robustness Under Logical Obfuscation
Abhilekh Borah, Shubhra Ghosh, Kedar Joshi, Aditya Kumar Guru, Kripabandhu Ghosh

TL;DR
This paper introduces Logifus, a logical obfuscation framework, and LogiQAte, a diagnostic benchmark, revealing that current LLMs' reasoning abilities are significantly hindered by logical obfuscation, exposing their superficial understanding.
Contribution
The paper presents Logifus and LogiQAte, novel tools for evaluating LLM robustness against logical obfuscation, highlighting vulnerabilities in current models' reasoning capabilities.
Findings
Obfuscation reduces GPT-4o performance by 47%.
Performance drops by 27% for GPT-5.
Reasoning models' accuracy decreases by 22%.
Abstract
Tasks such as solving arithmetic equations, evaluating truth tables, and completing syllogisms are handled well by large language models (LLMs) in their standard form, but they often fail when the same problems are posed in logically equivalent yet obfuscated formats. To study this vulnerability, we introduce Logifus, a structure-preserving logical obfuscation framework, and, utilizing this, we present LogiQAte, a first-of-its-kind diagnostic benchmark with 1,108 questions across four reasoning tasks: (i) Obfus FOL (first-order logic entailment under equivalence-preserving rewrites), (ii) Obfus Blood Relation (family-graph entailment under indirect relational chains), (iii) Obfus Number Series (pattern induction under symbolic substitutions), and (iv) Obfus Direction Sense (navigation reasoning under altered directions and reference frames). Across all the tasks, evaluating six…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Benford’s Law and Fraud Detection · Advanced Graph Neural Networks
