A Comprehensive Review of Protein Language Models
Lei Wang, Xudong Li, Han Zhang, Jinyi Wang, Dingkang Jiang, Zhidong, Xue, and Yan Wang

TL;DR
This paper offers a comprehensive review of protein language models, covering their development, architectures, evaluation metrics, benchmarks, applications, tools, and challenges to guide future research in the field.
Contribution
It provides the first broad macro-level overview of PLMs, integrating historical milestones, current trends, evaluation methods, and critical challenges in a systematic manner.
Findings
Highlights key historical milestones in PLMs
Analyzes model architectures and evaluation metrics
Discusses benchmarks, applications, and challenges
Abstract
At the intersection of the rapidly growing biological data landscape and advancements in Natural Language Processing (NLP), protein language models (PLMs) have emerged as a transformative force in modern research. These models have achieved remarkable progress, highlighting the need for timely and comprehensive overviews. However, much of the existing literature focuses narrowly on specific domains, often missing a broader analysis of PLMs. This study provides a systematic review of PLMs from a macro perspective, covering key historical milestones and current mainstream trends. We focus on the models themselves and their evaluation metrics, exploring aspects such as model architectures, positional encoding, scaling laws, and datasets. In the evaluation section, we discuss benchmarks and downstream applications. To further support ongoing research, we introduce relevant mainstream tools.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Topic Modeling · Genetics, Bioinformatics, and Biomedical Research
MethodsFocus
