Combating Data Laundering in LLM Training
Muxing Li, Zesheng Ye, Sharon Li, and Feng Liu

TL;DR
This paper introduces a method called synthesis data reversion (SDR) that enhances detection of data laundering in LLM training by inferring transformation goals and synthesizing queries to reveal proprietary data use.
Contribution
It proposes SDR, a novel approach to counter data laundering by inferring transformation goals and synthesizing queries, improving detection accuracy across various laundering practices and LLMs.
Findings
SDR consistently improves data misuse detection across multiple laundering techniques.
The method effectively infers laundering goals and refines queries to elicit stronger detection signals.
Evaluation on the MIMIR benchmark demonstrates SDR's practical effectiveness.
Abstract
Data rights owners can detect unauthorized data use in large language model (LLM) training by querying with proprietary samples. Often, superior performance (e.g., higher confidence or lower loss) on a sample relative to the untrained data implies it was part of the training corpus, as LLMs tend to perform better on data they have seen during training. However, this detection becomes fragile under data laundering, a practice of transforming the stylistic form of proprietary data, while preserving critical information to obfuscate data provenance. When an LLM is trained exclusively on such laundered variants, it no longer performs better on originals, erasing the signals that standard detections rely on. We counter this by inferring the unknown laundering transformation from black-box access to the target LLM and, via an auxiliary LLM, synthesizing queries that mimic the laundered data,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
