Inference-Time Toxicity Mitigation in Protein Language Models
Manuel Fern\'andez Burda, Santiago Aranguri, Iv\'an Arcuschin Moreno, Enzo Ferrante

TL;DR
This paper introduces Logit Diff Amplification (LDA), an inference-time control method for protein language models that reduces toxicity in generated proteins without retraining, maintaining biological plausibility and structural viability.
Contribution
The paper presents LDA, a novel inference-time technique to mitigate toxicity in protein language models, enhancing safety without affecting model training.
Findings
LDA reduces toxicity rates below baseline levels across four taxonomic groups.
LDA maintains distributional similarity to natural proteins, preserving generative quality.
LDA outperforms activation-based steering methods in preserving structural properties.
Abstract
Protein language models (PLMs) are becoming practical tools for de novo protein design, yet their dual-use potential raises safety concerns. We show that domain adaptation to specific taxonomic groups can elicit toxic protein generation, even when toxicity is not the training objective. To address this, we adapt Logit Diff Amplification (LDA) as an inference-time control mechanism for PLMs. LDA modifies token probabilities by amplifying the logit difference between a baseline model and a toxicity-finetuned model, requiring no retraining. Across four taxonomic groups, LDA consistently reduces predicted toxicity rate (measured via ToxDL2) below the taxon-finetuned baseline while preserving biological plausibility. We evaluate quality using Fr\'echet ESM Distance and predicted foldability (pLDDT), finding that LDA maintains distributional similarity to natural proteins and structural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Genomics and Rare Diseases · vaccines and immunoinformatics approaches
