LLMR: Knowledge Distillation with a Large Language Model-Induced Reward

Dongheng Li; Yongchang Hao; Lili Mou

arXiv:2409.12500·cs.CL·September 20, 2024

LLMR: Knowledge Distillation with a Large Language Model-Induced Reward

Dongheng Li, Yongchang Hao, Lili Mou

PDF

Open Access

TL;DR

This paper introduces LLMR, a knowledge distillation method leveraging large language model-induced rewards to improve the efficiency of NLP models, outperforming traditional methods across dialogue and summarization tasks.

Contribution

The paper presents a novel KD approach using reward functions derived from large language models, enhancing model performance in resource-constrained environments.

Findings

01

LLMR outperforms traditional KD methods in multiple NLP tasks.

02

The approach improves model efficiency without sacrificing accuracy.

03

Empirical results show consistent gains across datasets.

Abstract

Large language models have become increasingly popular and demonstrated remarkable performance in various natural language processing (NLP) tasks. However, these models are typically computationally expensive and difficult to be deployed in resource-constrained environments. In this paper, we propose LLMR, a novel knowledge distillation (KD) method based on a reward function induced from large language models. We conducted experiments on multiple datasets in the dialogue generation and summarization tasks. Empirical results demonstrate that our LLMR approach consistently outperforms traditional KD methods in different tasks and datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsKnowledge Distillation