On the Informativeness of the DNA Promoter Sequences Domain Theory
J. Ortega

TL;DR
This paper reinterprets the DNA promoter sequences domain theory using M-of-N concepts, achieving high accuracy and demonstrating the theory's informativeness without complex learning algorithms.
Contribution
It introduces a simple reinterpretation of the domain theory, showing its high potential for accurate classification and providing insights into its learning difficulty.
Findings
Achieved 93.4% accuracy on the database
Random interpretations have an expected 76.5% accuracy
Maximum accuracy of 97.2% found in 12 cases
Abstract
The DNA promoter sequences domain theory and database have become popular for testing systems that integrate empirical and analytical learning. This note reports a simple change and reinterpretation of the domain theory in terms of M-of-N concepts, involving no learning, that results in an accuracy of 93.4% on the 106 items of the database. Moreover, an exhaustive search of the space of M-of-N domain theory interpretations indicates that the expected accuracy of a randomly chosen interpretation is 76.5%, and that a maximum accuracy of 97.2% is achieved in 12 cases. This demonstrates the informativeness of the domain theory, without the complications of understanding the interactions between various learning algorithms and the theory. In addition, our results help characterize the difficulty of learning using the DNA promoters theory.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Advanced biosensing and bioanalysis techniques · Machine Learning and Data Classification
