A Comprehensive Review of Protein Language Models

Lei Wang; Xudong Li; Han Zhang; Jinyi Wang; Dingkang Jiang; Zhidong; Xue; and Yan Wang

arXiv:2502.06881·q-bio.BM·February 12, 2025·6 cites

A Comprehensive Review of Protein Language Models

Lei Wang, Xudong Li, Han Zhang, Jinyi Wang, Dingkang Jiang, Zhidong, Xue, and Yan Wang

PDF

Open Access 1 Repo

TL;DR

This paper offers a comprehensive review of protein language models, covering their development, architectures, evaluation metrics, benchmarks, applications, tools, and challenges to guide future research in the field.

Contribution

It provides the first broad macro-level overview of PLMs, integrating historical milestones, current trends, evaluation methods, and critical challenges in a systematic manner.

Findings

01

Highlights key historical milestones in PLMs

02

Analyzes model architectures and evaluation metrics

03

Discusses benchmarks, applications, and challenges

Abstract

At the intersection of the rapidly growing biological data landscape and advancements in Natural Language Processing (NLP), protein language models (PLMs) have emerged as a transformative force in modern research. These models have achieved remarkable progress, highlighting the need for timely and comprehensive overviews. However, much of the existing literature focuses narrowly on specific domains, often missing a broader analysis of PLMs. This study provides a systematic review of PLMs from a macro perspective, covering key historical milestones and current mainstream trends. We focus on the models themselves and their evaluation metrics, exploring aspects such as model architectures, positional encoding, scaling laws, and datasets. In the evaluation section, we discuss benchmarks and downstream applications. To further support ongoing research, we introduce relevant mainstream tools.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ISYSLAB-HUST/Protein-Language-Models
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Bioinformatics · Topic Modeling · Genetics, Bioinformatics, and Biomedical Research

MethodsFocus