Parameter Efficient Diverse Paraphrase Generation Using Sequence-Level   Knowledge Distillation

Lasal Jayawardena; Prasan Yapa

arXiv:2404.12596·cs.CL·April 22, 2024

Parameter Efficient Diverse Paraphrase Generation Using Sequence-Level Knowledge Distillation

Lasal Jayawardena, Prasan Yapa

PDF

TL;DR

This paper introduces smaller, efficient paraphrasing models distilled from large language models that maintain high quality, diversity, and syntactic variation while significantly reducing inference time and computational costs.

Contribution

The study presents a sequence-level knowledge distillation approach to create compact paraphrasing models that match LLM quality with faster inference and enhanced diversity.

Findings

01

Distilled models achieve only 4% performance drop compared to LLMs.

02

Models generate diverse paraphrases with syntactic and lexical variation.

03

Inference speed is significantly increased, reducing costs.

Abstract

Over the past year, the field of Natural Language Generation (NLG) has experienced an exponential surge, largely due to the introduction of Large Language Models (LLMs). These models have exhibited the most effective performance in a range of domains within the Natural Language Processing and Generation domains. However, their application in domain-specific tasks, such as paraphrasing, presents significant challenges. The extensive number of parameters makes them difficult to operate on commercial hardware, and they require substantial time for inference, leading to high costs in a production setting. In this study, we tackle these obstacles by employing LLMs to develop three distinct models for the paraphrasing field, applying a method referred to as sequence-level knowledge distillation. These distilled models are capable of maintaining the quality of paraphrases generated by the LLM.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.