Protecting Language Models Against Unauthorized Distillation through Trace Rewriting

Xinhang Ma; William Yeoh; Ning Zhang; Yevgeniy Vorobeychik

arXiv:2602.15143·cs.AI·April 20, 2026

Protecting Language Models Against Unauthorized Distillation through Trace Rewriting

Xinhang Ma, William Yeoh, Ning Zhang, Yevgeniy Vorobeychik

PDF

1 Repo

TL;DR

This paper proposes methods to modify language model reasoning traces to prevent unauthorized knowledge distillation and embed verifiable watermarks, balancing security with answer quality.

Contribution

It introduces dynamic trace rewriting techniques that degrade distillation usefulness and embed watermarks without compromising answer correctness.

Findings

01

Instruction-based rewriting effectively deters distillation.

02

Rewritten traces can embed watermarks with minimal false alarms.

03

Rewriting can improve teacher model performance.

Abstract

Knowledge distillation is a widely adopted technique for transferring capabilities from LLMs to smaller, more efficient student models. However, unauthorized use of knowledge distillation takes unfair advantage of the considerable effort and cost put into developing frontier models. We investigate methods for modifying teacher-generated reasoning traces to achieve two objectives that deter unauthorized distillation: (1) \emph{anti-distillation}, or degrading the training usefulness of query responses, and (2) \emph{API watermarking}, which embeds verifiable signatures in student models. We introduce several approaches for dynamically rewriting a teacher's reasoning outputs while preserving answer correctness and semantic coherence. Two of these leverage the rewriting capabilities of LLMs, while others use gradient-based techniques. Our experiments show that a simple instruction-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xhOwenMa/trace-rewriting
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.