A statistical physics perspective on alignment-independent protein sequence comparison
Amit K Chattopadhyay, Diar Nasiev, Darren R Flower

TL;DR
This paper introduces a novel alignment-free method for protein sequence comparison based on statistical physics, specifically using first passage probability distributions to analyze amino acid propensities, offering an alternative to traditional alignment-based methods.
Contribution
The paper presents a new statistical physics-based approach for alignment-independent protein sequence comparison, expanding the toolkit beyond traditional alignment algorithms.
Findings
Provides a new statistical framework for protein comparison
Demonstrates potential advantages in structure-function analysis
Offers a complementary method to existing alignment techniques
Abstract
Within bioinformatics, the textual alignment of amino acid sequences has long dominated the determination of similarity between proteins, with all that implies for shared structure, function and evolutionary descent. Despite the relative success of modern-day sequence alignment algorithms, so-called alignment-free approaches offer a complementary means of determining and expressing similarity, with potential benefits in certain key applications, such as regression analysis of protein structure-function studies, where alignment-base similarity has performed poorly. Here, we offer a fresh, statistical physics-based perspective focusing on the question of alignment-free comparison, in the process adapting results from 'first passage probability distribution' to summarize statistics of ensemble averaged amino acid propensity values. In this article, we introduce and elaborate this approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Genomics and Phylogenetic Studies · Protein Structure and Dynamics
