Controlling Repetition in Protein Language Models
Jiahao Zhang, Zeqing Zhang, Di Wang, and Lijie Hu

TL;DR
This paper systematically studies repetition in protein language models, introduces metrics to quantify it, and proposes a dataset-guided steering method that reduces repetition without harming structural quality.
Contribution
It is the first to analyze repetition in PLMs, develop metrics for it, and introduce UCCS, a novel dataset-based steering technique to control repetition during protein generation.
Findings
UCCS effectively reduces repetition in protein sequences.
The method maintains high structural confidence scores.
It outperforms existing decoding penalties and baselines.
Abstract
Protein language models (PLMs) have enabled advances in structure prediction and de novo protein design, yet they frequently collapse into pathological repetition during generation. Unlike in text, where repetition merely reduces readability, in proteins it undermines structural confidence and functional viability. To unify this problem, we present the first systematic study of repetition in PLMs. We first propose quantitative metrics to characterize motif-level and homopolymer repetition and then demonstrate their negative impact on folding reliability. To address this challenge, we propose UCCS (Utility-Controlled Contrastive Steering), which steers protein generation with a constrained dataset. Instead of naively contrasting high- vs. low-repetition sequences, we construct contrastive sets that maximize differences in repetition while tightly controlling for structural utility. This…
Peer Reviews
Decision·ICLR 2026 Poster
a. This paper is the first to clearly define and systematically address the problem of repetition in protein language models. This is done while understanding biological consequences. Rather than borrowing loosely from NLP, the authors develop a domain-aware framework that captures why repetition is especially damaging in proteins. This makes framework, truly domain specific. b. The proposed method, UCCS, stands out for its simplicity. It doesn’t require retraining the model or altering its arch
a. While UCCS performs well on ESM-3 and ProtGPT2, it’s still an open question how well the approach would transfer to larger or more diverse protein models like ProGen2 or ProteinMPNN? It would be valuable to see whether the same steering technique holds up as models scale or shift in architecture.
1. This paper first systematically identify and formalize pathological repetition in PLM, with quantitative metric. The metrics are well-motivated and interpretable. 2. The UCCS approach is simple, training-free and model agnostic. It greatly reduces repetition while maintaining foldability. 3. The experiments cover both MLM and autoregressive models, both unconditional and conditional generation settings with multiple datasets.
1. Limited Novelty. I'm not familiar with steering methods during inference time with LLM, but according to the cited papers, it looks like deriving the difference vector and adding it during inference has been greatly explored in LLM. The adaptation to PLM is incremental rather than fundamentally new. 2. Comparison with learning based methods. Similarly, I have no idea about if people reduce repetition with learing-based method in LLM. If so, is it possible to compare UCCS with these learning b
- The problem studied is relevant and interesting. The repetition problem in PLMs deserves more attention for the related research efforts - The proposed method, UCCS is model-agnostic and plug-and-play, making it easy to implement and adapt for existing PLMs - The experimental results support the claim that the proposed UCCS “mitigates degeneracy while preserving foldability”, yet I believe it can be further improved (refer to cons/questions).
- This study still does not explicitly answer (or explore further) why common PLMs (or some of the PLMs) intrinsically exhibit repetition in their generated samples, which I think is more important to guide the PLMs design for the community; furthermore, AlphaFold-like confidence score (aka foldability) only serve as a probe and can hardly counted as strong indicators for good design, broader metrics should be considered in the main results to let this paper become a comprehensive study. I do no
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Genomics and Rare Diseases · RNA and protein synthesis mechanisms
