Identification of Protein Coding Regions in Genomic DNA Using Unsupervised FMACA Based Pattern Classifier
Pokkuluri Kiran Sree, Inampudi Ramesh Babu

TL;DR
This paper introduces an unsupervised FMACA-based pattern classifier that accurately identifies protein-coding regions in DNA sequences, demonstrating scalability and improved accuracy over previous methods.
Contribution
It proposes a novel unsupervised FMACA classifier with a new K-Means design for better accuracy in DNA coding region identification.
Findings
High classification accuracy achieved
Scalable to large datasets
Outperforms previous classifiers
Abstract
Genes carry the instructions for making proteins that are found in a cell as a specific sequence of nucleotides that are found in DNA molecules. But, the regions of these genes that code for proteins may occupy only a small region of the sequence. Identifying the coding regions play a vital role in understanding these genes. In this paper we propose a unsupervised Fuzzy Multiple Attractor Cellular Automata (FMCA) based pattern classifier to identify the coding region of a DNA sequence. We propose a distinct K-Means algorithm for designing FMACA classifier which is simple, efficient and produces more accurate classifier than that has previously been obtained for a range of different sequence lengths. Experimental results confirm the scalability of the proposed Unsupervised FCA based classifier to handle large volume of datasets irrespective of the number of classes, tuples and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCellular Automata and Applications · Fractal and DNA sequence analysis · DNA and Biological Computing
