When Bad Data Leads to Good Models
Kenneth Li, Yida Chen, Fernanda Vi\'egas, Martin Wattenberg

TL;DR
This paper challenges the traditional view that data quality solely determines language model quality, showing that training on toxic data can improve post-training toxicity control and model robustness.
Contribution
It demonstrates that toxic data can lead to more controllable and easier-to-detox models, revealing a nuanced relationship between data toxicity and model behavior.
Findings
Toxic data results in less entangled toxicity representations.
Models trained on toxic data are easier to detoxify.
Toxic data can improve the trade-off between toxicity reduction and capability preservation.
Abstract
In large language model (LLM) pretraining, data quality is believed to determine model quality. In this paper, we re-examine the notion of "quality" from the perspective of pre- and post-training co-design. Specifically, we explore the possibility that pre-training on more toxic data can lead to better control in post-training, ultimately decreasing a model's output toxicity. First, we use a toy experiment to study how data composition affects the geometry of features in the representation space. Next, through controlled experiments with Olmo-1B models trained on varying ratios of clean and toxic data, we find that the concept of toxicity enjoys a less entangled linear representation as the proportion of toxic data increases. Furthermore, we show that although toxic data increases the generational toxicity of the base model, it also makes the toxicity easier to remove. Evaluations on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection · Artificial Intelligence in Healthcare and Education
MethodsBalanced Selection
