Overcoming Barriers to Skill Injection in Language Modeling: Case Study in Arithmetic
Mandar Sharma, Nikhil Muralidhar, Naren Ramakrishnan

TL;DR
This paper presents a novel framework with information-theoretic interventions to enable large language models to acquire mathematical reasoning skills without losing their linguistic capabilities.
Contribution
It introduces a new method to inject non-linguistic skills into language models while preventing catastrophic forgetting of linguistic knowledge.
Findings
Successful injection of arithmetic reasoning into language models
Retention of linguistic skills after skill injection
Enhanced performance on mathematical reasoning tasks
Abstract
Through their transfer learning abilities, highly-parameterized large pre-trained language models have dominated the NLP landscape for a multitude of downstream language tasks. Though linguistically proficient, the inability of these models to incorporate the learning of non-linguistic entities (numerals and arithmetic reasoning) limits their usage for tasks that require numeric comprehension or strict mathematical reasoning. However, as we illustrate in this paper, building a general purpose language model that also happens to be proficient in mathematical reasoning is not as straight-forward as training it on a numeric dataset. In this work, we develop a novel framework that enables language models to be mathematically proficient while retaining their linguistic prowess. Specifically, we offer information-theoretic interventions to overcome the catastrophic forgetting of linguistic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Educational Assessment and Pedagogy
