Assessing the Emergent Symbolic Reasoning Abilities of Llama Large   Language Models

Flavio Petruzzellis; Alberto Testolin; Alessandro Sperduti

arXiv:2406.06588·cs.CL·June 12, 2024·1 cites

Assessing the Emergent Symbolic Reasoning Abilities of Llama Large Language Models

Flavio Petruzzellis, Alberto Testolin, Alessandro Sperduti

PDF

Open Access

TL;DR

This paper systematically evaluates the symbolic reasoning abilities of Llama 2 models, revealing that larger and fine-tuned models perform better on mathematical tasks, especially those of lower complexity, but still face challenges with more difficult formulas.

Contribution

It provides a comprehensive analysis of Llama 2's capabilities in symbolic reasoning, highlighting the effects of model size and fine-tuning on performance.

Findings

01

Performance improves with larger models and fine-tuning.

02

Models excel at low-complexity mathematical formulas.

03

High complexity formulas remain challenging even for fine-tuned models.

Abstract

Large Language Models (LLMs) achieve impressive performance in a wide range of tasks, even if they are often trained with the only objective of chatting fluently with users. Among other skills, LLMs show emergent abilities in mathematical reasoning benchmarks, which can be elicited with appropriate prompting methods. In this work, we systematically investigate the capabilities and limitations of popular open-source LLMs on different symbolic reasoning tasks. We evaluate three models of the Llama 2 family on two datasets that require solving mathematical formulas of varying degrees of difficulty. We test a generalist LLM (Llama 2 Chat) as well as two fine-tuned versions of Llama 2 (MAmmoTH and MetaMath) specifically designed to tackle mathematical problems. We observe that both increasing the scale of the model and fine-tuning it on relevant tasks lead to significant performance gains.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsLLaMA