Overcoming Barriers to Skill Injection in Language Modeling: Case Study   in Arithmetic

Mandar Sharma; Nikhil Muralidhar; Naren Ramakrishnan

arXiv:2211.02098·cs.CL·November 7, 2022·1 cites

Overcoming Barriers to Skill Injection in Language Modeling: Case Study in Arithmetic

Mandar Sharma, Nikhil Muralidhar, Naren Ramakrishnan

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel framework with information-theoretic interventions to enable large language models to acquire mathematical reasoning skills without losing their linguistic capabilities.

Contribution

It introduces a new method to inject non-linguistic skills into language models while preventing catastrophic forgetting of linguistic knowledge.

Findings

01

Successful injection of arithmetic reasoning into language models

02

Retention of linguistic skills after skill injection

03

Enhanced performance on mathematical reasoning tasks

Abstract

Through their transfer learning abilities, highly-parameterized large pre-trained language models have dominated the NLP landscape for a multitude of downstream language tasks. Though linguistically proficient, the inability of these models to incorporate the learning of non-linguistic entities (numerals and arithmetic reasoning) limits their usage for tasks that require numeric comprehension or strict mathematical reasoning. However, as we illustrate in this paper, building a general purpose language model that also happens to be proficient in mathematical reasoning is not as straight-forward as training it on a numeric dataset. In this work, we develop a novel framework that enables language models to be mathematically proficient while retaining their linguistic prowess. Specifically, we offer information-theoretic interventions to overcome the catastrophic forgetting of linguistic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mandar-sharma/overcomingbarriers
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Educational Assessment and Pedagogy