TL;DR
SymCode introduces a neurosymbolic framework that leverages verifiable code generation with SymPy to improve the accuracy and trustworthiness of mathematical reasoning in large language models.
Contribution
It presents a novel neurosymbolic approach that reframes mathematical reasoning as verifiable code generation, significantly enhancing accuracy and transparency.
Findings
Achieves up to 13.6% accuracy improvement on benchmarks.
More token-efficient than previous methods.
Reduces failures from logical fallacies to programmatic errors.
Abstract
Large Language Models (LLMs) often struggle with complex mathematical reasoning, where prose-based generation leads to unverified and arithmetically unsound solutions. Current prompting strategies like Chain of Thought still operate within this unreliable medium, lacking a mechanism for deterministic verification. To address these limitations, we introduce SymCode, a neurosymbolic framework that reframes mathematical problem-solving as a task of verifiable code generation using the SymPy library. We evaluate SymCode on challenging benchmarks, including MATH-500 and OlympiadBench, demonstrating significant accuracy improvements of up to 13.6 percentage points over baselines. Our analysis shows that SymCode is not only more token-efficient but also fundamentally shifts model failures from opaque logical fallacies towards transparent, programmatic errors. By grounding LLM reasoning in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
