The Protein Family Classification in Protein Databases via Entropy Measures
R.P. Mondaini, S.C. de Albuquerque Neto

TL;DR
This paper reviews entropy-based methods for classifying protein families in databases, analyzing amino acid distributions to gain insights into protein structure and evolution.
Contribution
It introduces a mathematical approach using entropy measures to classify protein families and encourages further statistical analysis of amino acid distributions.
Findings
Analysis of Pfam database reveals patterns in amino acid distributions.
Entropy measures effectively differentiate protein families.
Provides a framework for future research in protein classification.
Abstract
In the present work, we review the fundamental methods which have been developed in the last few years for classifying into families and clans the distribution of amino acids in protein databases. This is done through functions of random variables, the Entropy Measures of probabilities of occurrence of the amino acids. An intensive study of the Pfam databases is presented with restrictions to families which could be represented by rectangular arrays of amino acids with m rows (protein domains) and n columns (amino acids). This work is also an invitation to scientific research groups worldwide to undertake the statistical analysis with different numbers of rows and columns since we believe in the mathematical characterization of the distribution of amino acids as a fundamental insight on the determination of protein structure and evolution.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Fractal and DNA sequence analysis · Computational Drug Discovery Methods
