Investigating Symbolic Capabilities of Large Language Models
Neisarg Dave, Daniel Kifer, C. Lee Giles, Ankur Mali

TL;DR
This paper evaluates large language models' abilities to perform symbolic reasoning tasks, revealing significant performance declines with increased complexity and highlighting the need for specialized training and model adjustments.
Contribution
It provides a comprehensive evaluation of LLMs on symbolic tasks using Chomsky's Hierarchy, an area previously underexplored in LLM research.
Findings
Performance declines as symbolic complexity increases
Fine-tuned GPT-3.5 shows limited improvement
Models have limited generalization on symbolic tasks
Abstract
Prompting techniques have significantly enhanced the capabilities of Large Language Models (LLMs) across various complex tasks, including reasoning, planning, and solving math word problems. However, most research has predominantly focused on language-based reasoning and word problems, often overlooking the potential of LLMs in handling symbol-based calculations and reasoning. This study aims to bridge this gap by rigorously evaluating LLMs on a series of symbolic tasks, such as addition, multiplication, modulus arithmetic, numerical precision, and symbolic counting. Our analysis encompasses eight LLMs, including four enterprise-grade and four open-source models, of which three have been pre-trained on mathematical tasks. The assessment framework is anchored in Chomsky's Hierarchy, providing a robust measure of the computational abilities of these models. The evaluation employs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
