SCOPE: Intrinsic Semantic Space Control for Mitigating Copyright Infringement in LLMs
Zhenliang Zhang, Xinyu Hu, Xiaojun Wan

TL;DR
SCOPE is an inference-time method for large language models that mitigates copyright infringement by controlling semantic space representations without needing parameter updates or external filters.
Contribution
It introduces a novel semantic-space control approach using sparse autoencoders to identify and clamp copyright-sensitive subspaces during inference.
Findings
Mitigates copyright infringement effectively
Maintains model utility and performance
Provides interpretability of semantic subspaces
Abstract
Large language models sometimes inadvertently reproduce passages that are copyrighted, exposing downstream applications to legal risk. Most existing studies for inference-time defences focus on surface-level token matching and rely on external blocklists or filters, which add deployment complexity and may overlook semantically paraphrased leakage. In this work, we reframe copyright infringement mitigation as intrinsic semantic-space control and introduce SCOPE, an inference-time method that requires no parameter updates or auxiliary filters. Specifically, the sparse autoencoder (SAE) projects hidden states into a high-dimensional, near-monosemantic space; benefiting from this representation, we identify a copyright-sensitive subspace and clamp its activations during decoding. Experiments on widely recognized benchmarks show that SCOPE mitigates copyright infringement without degrading…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Physical Unclonable Functions (PUFs) and Hardware Security · Generative Adversarial Networks and Image Synthesis
