Knowledge-Augmented Reasoning Distillation for Small Language Models in   Knowledge-Intensive Tasks

Minki Kang; Seanie Lee; Jinheon Baek; Kenji Kawaguchi; Sung Ju Hwang

arXiv:2305.18395·cs.CL·October 31, 2023·21 cites

Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks

Minki Kang, Seanie Lee, Jinheon Baek, Kenji Kawaguchi, Sung Ju Hwang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces KARD, a method to enhance small language models' reasoning abilities by distilling knowledge from large models and external knowledge bases, significantly improving performance on knowledge-intensive tasks.

Contribution

The paper proposes KARD, a novel knowledge-augmented distillation approach that enables small LMs to better memorize and utilize external knowledge for reasoning tasks.

Findings

01

KARD improves small T5 and GPT models on reasoning datasets.

02

250M T5 models outperform larger 3B models with KARD.

03

Significant performance gains on MedQA-USMLE, StrategyQA, OpenbookQA.

Abstract

Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks that require a compound understanding of knowledge. However, deployment of the LLMs in real-world applications can be challenging due to their high computational requirements and concerns on data privacy. Previous studies have focused on building task-specific small Language Models (LMs) by fine-tuning them with labeled data or distilling LLMs. However, these approaches are ill-suited for knowledge-intensive reasoning tasks due to the limited capacity of small LMs in memorizing the knowledge required. Motivated by our theoretical analysis on memorization, we propose Knowledge-Augmented Reasoning Distillation (KARD), a novel method that fine-tunes small LMs to generate rationales obtained from LLMs with augmented knowledge retrieved from an external knowledge base. Moreover, we further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nardien/kard
pytorchOfficial

Videos

Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Cosine Annealing · Adafactor · Adam · Inverse Square Root Schedule · Discriminative Fine-Tuning · Weight Decay