Small Language Models are Equation Reasoners
Bumjun Kim, Kunha Lee, Juyeon Kim, Sangam Lee

TL;DR
This paper shows that small language models struggle with arithmetic reasoning due to natural language variability, but using a unified equation-only format significantly improves their reasoning abilities, especially in very small models.
Contribution
The paper introduces the equation-only reasoning format as a simple yet effective way to enhance small language models' arithmetic reasoning capabilities.
Findings
Equation-only format improves small LLMs' arithmetic reasoning.
Very small models like T5-Tiny benefit significantly from this approach.
Natural language variability causes high ambiguity in small models.
Abstract
Chain-of-Thought (CoT) reasoning has enabled Large Language Model (LLM) to achieve remarkable performance in various NLP tasks, including arithmetic problem-solving. However, this success does not generalize to small language model (sLM) like T5, due to their limited capacity and absence of emergent abilities associated with larger models. Recent works to enhance sLM through knowledge distillation have yielded some improvements but still face significant limitations, particularly high ambiguity from the variability in natural language expressions and substantial computational costs. In this paper, we investigate why sLM perform poorly on arithmetic reasoning tasks and hypothesize that natural language format variability introduces high ambiguity for these smaller models. Based on this hypothesis, we conduct experiments with equation-only format, which is a reasoning format that unifies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Gated Linear Unit · SentencePiece · Softmax · Layer Normalization · Adafactor · Inverse Square Root Schedule · Dropout
