A Comprehensive Review of Transformer-based language models for Protein Sequence Analysis and Design
Nimisha Ghosh, Daniele Santoni, Debaleena Nawn, Eleonora Ottaviani, Giovanni Felici

TL;DR
This review summarizes recent advances in Transformer-based language models applied to protein sequence analysis and design, highlighting their strengths, weaknesses, and future research directions.
Contribution
It provides a comprehensive analysis of current Transformer-based methods in bioinformatics, emphasizing their applications and identifying gaps for future exploration.
Findings
Transformers have significantly impacted protein analysis tasks.
Current models excel in gene ontology and protein structure prediction.
Identified limitations include data scarcity and model interpretability.
Abstract
The impact of Transformer-based language models has been unprecedented in Natural Language Processing (NLP). The success of such models has also led to their adoption in other fields including bioinformatics. Taking this into account, this paper discusses recent advances in Transformer-based models for protein sequence analysis and design. In this review, we have discussed and analysed a significant number of works pertaining to such applications. These applications encompass gene ontology, functional and structural protein identification, generation of de novo proteins and binding of proteins. We attempt to shed light on the strength and weaknesses of the discussed works to provide a comprehensive insight to readers. Finally, we highlight shortcomings in existing research and explore potential avenues for future developments. We believe that this review will help researchers working in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Biomedical Text Mining and Ontologies · Bioinformatics and Genomic Networks
