PromptIntern: Saving Inference Costs by Internalizing Recurrent Prompt   during Large Language Model Fine-tuning

Jiaru Zou; Mengyu Zhou; Tao Li; Shi Han; Dongmei Zhang

arXiv:2407.02211·cs.CL·October 17, 2024

PromptIntern: Saving Inference Costs by Internalizing Recurrent Prompt during Large Language Model Fine-tuning

Jiaru Zou, Mengyu Zhou, Tao Li, Shi Han, Dongmei Zhang

PDF

Open Access

TL;DR

PromptIntern introduces a novel fine-tuning approach that internalizes recurrent prompts into model parameters, significantly reducing inference costs and token usage while maintaining performance on NL2Code tasks.

Contribution

It presents a new method to embed prompt knowledge directly into model parameters, decreasing reliance on lengthy prompts during inference.

Findings

01

Reduces input tokens by over 90%

02

Speeds up inference by 4.2 times

03

Cuts inference costs by 88.3%

Abstract

Recent advances in fine-tuning large language models (LLMs) have greatly enhanced their usage in domain-specific tasks. Despite the success, fine-tuning continues to rely on repeated and lengthy prompts, which escalate computational expenses, require more resources, and lead to slower inference. In this paper, we present a novel approach, PromptIntern, which internalizes prompt knowledge during model fine-tuning to achieve efficient inference and save costs. Instead of compressing the prompts for a vanilla model, PromptIntern aims to embed the recurrent prompt directly into the model parameters. We design a fine-tuning pipeline that includes instruction template compression, few-shot example absorption, and a progressive internalization strategy, effectively diminishing the need for intricate prompts during inference. Comprehensive experiments on challenging NL2Code tasks demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques