FregeLogic at SemEval 2026 Task 11: A Hybrid Neuro-Symbolic Architecture for Content-Robust Syllogistic Validity Prediction
Adewale Akinfaderin, Nafi Diallo

TL;DR
FregeLogic is a hybrid neuro-symbolic system combining LLM ensembles and formal logic to improve syllogistic validity prediction by reducing content bias and enhancing accuracy.
Contribution
The paper introduces a novel neuro-symbolic architecture that integrates LLM ensembles with a formal SMT solver to improve content-robust logical validity prediction.
Findings
Achieved 94.3% accuracy with reduced content effect of 2.85.
Improved combined score by 2.76 points over pure ensemble.
Reduced Z3 failure rate from ~22% to near zero using structured API calls.
Abstract
We present FregeLogic, a hybrid neuro-symbolic system for SemEval-2026 Task 11 (Subtask 1), which addresses syllogistic validity prediction while reducing content effects on predictions. Our approach combines an ensemble of five LLM classifiers, spanning three open-weights models (Llama 4 Maverick, Llama 4 Scout, and Qwen3-32B) paired with varied prompting strategies, with a Z3 SMT solver that serves as a formal logic tiebreaker. The central hypothesis is that LLM disagreement within the ensemble signals likely content-biased errors, where real-world believability interferes with logical judgment. By deferring to Z3's structurally-grounded formal verification on these disputed cases, our system achieves 94.3% accuracy with a content effect of 2.85 and a combined score of 41.88 in nested 5-fold cross-validation on the dataset (N=960). This represents a 2.76-point improvement in combined…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
