Inference-Time Toxicity Mitigation in Protein Language Models

Manuel Fern\'andez Burda; Santiago Aranguri; Iv\'an Arcuschin Moreno; Enzo Ferrante

arXiv:2603.04045·cs.LG·March 5, 2026

Inference-Time Toxicity Mitigation in Protein Language Models

Manuel Fern\'andez Burda, Santiago Aranguri, Iv\'an Arcuschin Moreno, Enzo Ferrante

PDF

Open Access

TL;DR

This paper introduces Logit Diff Amplification (LDA), an inference-time control method for protein language models that reduces toxicity in generated proteins without retraining, maintaining biological plausibility and structural viability.

Contribution

The paper presents LDA, a novel inference-time technique to mitigate toxicity in protein language models, enhancing safety without affecting model training.

Findings

01

LDA reduces toxicity rates below baseline levels across four taxonomic groups.

02

LDA maintains distributional similarity to natural proteins, preserving generative quality.

03

LDA outperforms activation-based steering methods in preserving structural properties.

Abstract

Protein language models (PLMs) are becoming practical tools for de novo protein design, yet their dual-use potential raises safety concerns. We show that domain adaptation to specific taxonomic groups can elicit toxic protein generation, even when toxicity is not the training objective. To address this, we adapt Logit Diff Amplification (LDA) as an inference-time control mechanism for PLMs. LDA modifies token probabilities by amplifying the logit difference between a baseline model and a toxicity-finetuned model, requiring no retraining. Across four taxonomic groups, LDA consistently reduces predicted toxicity rate (measured via ToxDL2) below the taxon-finetuned baseline while preserving biological plausibility. We evaluate quality using Fr\'echet ESM Distance and predicted foldability (pLDDT), finding that LDA maintains distributional similarity to natural proteins and structural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Genomics and Rare Diseases · vaccines and immunoinformatics approaches