Automated DNA Motif Discovery
W. B. Langdon, Olivia Sanchez Graillet, A. P. Harrison

TL;DR
This paper presents an automated method for discovering DNA motifs using genetic programming with a BNF grammar to generate valid regular expressions, aiding in identifying gene types and functions.
Contribution
It introduces a novel approach combining BNF grammar with genetic programming for DNA motif discovery, ensuring syntactically valid motifs and improving biological pattern detection.
Findings
Successfully identified motifs indicating non-protein coding genes
Demonstrated the method's ability to generate biologically relevant motifs
Enhanced motif discovery accuracy over traditional methods
Abstract
Ensembl's human non-coding and protein coding genes are used to automatically find DNA pattern motifs. The Backus-Naur form (BNF) grammar for regular expressions (RE) is used by genetic programming to ensure the generated strings are legal. The evolved motif suggests the presence of Thymine followed by one or more Adenines etc. early in transcripts indicate a non-protein coding gene. Keywords: pseudogene, short and microRNAs, non-coding transcripts, systems biology, machine learning, Bioinformatics, motif, regular expression, strongly typed genetic programming, context-free grammar.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Genomics and Phylogenetic Studies · RNA and protein synthesis mechanisms
