Biological Sequence Clustering: A Survey

Simeng Zhang; Xinying Liu; Jun Lou; Mudi Jiang; Quan Zou; Zengyou He

arXiv:2601.14624·q-bio.GN·January 22, 2026

Biological Sequence Clustering: A Survey

Simeng Zhang, Xinying Liu, Jun Lou, Mudi Jiang, Quan Zou, Zengyou He

PDF

Open Access

TL;DR

This survey comprehensively reviews biological sequence clustering methods, discussing their strategies, paradigms, objectives, and challenges to guide future research in large-scale bioinformatics analysis.

Contribution

It provides a detailed overview of existing algorithms, categorizing them by similarity modeling, clustering paradigms, and objectives, highlighting trade-offs and future directions.

Findings

01

Summarizes main strategies for modeling sequence similarity.

02

Classifies clustering paradigms and discusses their trade-offs.

03

Identifies current limitations and future challenges.

Abstract

The rapid development of high-throughput sequencing technologies has led to an explosive increase in biological sequence data, making sequence clustering a fundamental task in large-scale bioinformatics analyses. Unlike traditional clustering problems, biological sequence clustering faces unique challenges due to the lack of direct similarity measures, strict biological constraints, and demanding requirements for both scalability and accuracy. Over the past decades, a wide variety of methods have been developed, differing in how they model sequence similarity, construct clusters, and prioritize optimization objectives. In this review, we provide a comprehensive methodological overview of biological sequence clustering algorithms. We begin by summarizing the main strategies for modeling sequence similarity, which can be divided into three stages: sequence encoding, feature generation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBioinformatics and Genomic Networks · Gene expression and cancer classification · Genomics and Phylogenetic Studies