Controlling Equational Reasoning in Large Language Models with Prompt Interventions
Jordan Meadows, Marco Valentino, Andre Freitas

TL;DR
This paper explores controlling hallucination rates in large language models through prompt interventions, using symbolic data generation to analyze and improve mathematical derivation accuracy.
Contribution
It introduces a symbolic framework for data generation and prompt interventions to systematically study and control mathematical errors in LLMs.
Findings
T5-Large outperforms GPT-4 on generated evaluation sets.
Prompt interventions influence derivation quality and error distribution.
Human evaluation reveals weaknesses not captured by reference-based metrics.
Abstract
This paper investigates how hallucination rates in Large Language Models (LLMs) may be controlled via a symbolic data generation framework, exploring a fundamental relationship between the rate of certain mathematical errors and types of input intervention. Specifically, we systematically generate data for a derivation generation task using a symbolic engine, applying targeted interventions to prompts to perturb features of mathematical derivations such as the surface forms of symbols, equational tree structures, and mathematical context. We then evaluate the effect of prompt interventions across a range of LLMs including fine-tuned T5 models, GPT, and LLaMa-based models. Our experiments suggest that T5-Large can outperform the few-shot performance of GPT-4 on various evaluation sets generated via the framework. However, an extensive evaluation based on human analysis, template-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Mathematics, Computing, and Information Processing
MethodsGated Linear Unit · Multi-Head Attention · Attention Is All You Need · fail · Byte Pair Encoding · Weight Decay · Discriminative Fine-Tuning · Residual Connection · Adam · Layer Normalization
