CopySpec: Accelerating LLMs with Speculative Copy-and-Paste Without Compromising Quality

Razvan-Gabriel Dumitru; Minglai Yang; Vikas Yadav; Mihai Surdeanu

arXiv:2502.08923·cs.CL·May 26, 2025

CopySpec: Accelerating LLMs with Speculative Copy-and-Paste Without Compromising Quality

Razvan-Gabriel Dumitru, Minglai Yang, Vikas Yadav, Mihai Surdeanu

PDF

Open Access 1 Repo

TL;DR

CopySpec is a technique that accelerates large language model inference by identifying repeated sequences and copying them, significantly improving speed without sacrificing output quality across various datasets and models.

Contribution

We introduce CopySpec, a novel method that enhances inference speed of LLMs by efficiently copying repeated sequences, compatible with speculative decoding and tested on multiple datasets.

Findings

01

Achieves up to 3.08x speed-up on specific tasks

02

Seamlessly integrates with speculative decoding for 49% additional speed-up

03

Leverages larger contexts to further accelerate inference

Abstract

We introduce CopySpec, a simple yet effective technique to tackle the inefficiencies LLMs face when generating responses that closely resemble previous outputs or responses that can be verbatim extracted from context. CopySpec identifies repeated sequences in the model's chat history or context and speculates that the same tokens will follow, enabling seamless copying without compromising output quality and without requiring additional GPU memory. To evaluate the effectiveness of our approach, we conducted experiments using seven LLMs and five datasets: MT-Bench, CNN/DM, GSM8K, HumanEval, and our newly created dataset, MT-Redundant. MT-Redundant, introduced in this paper, transforms the second turn of MT-Bench into a request for variations of the first turn's answer, simulating real-world scenarios where users request modifications to prior responses. Our results demonstrate significant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

razvandu/copyspec
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security · Semantic Web and Ontologies