Machine learning applications to DNA subsequence and restriction site analysis
Ethan J. Moyer (1), Anup Das (PhD) (2) ((1) School of Biomedical, Engineering, Science, Health Systems, Drexel University, Philadelphia,, Pennsylvania, USA, (2) College of Engineering, Drexel University,, Philadelphia, Pennsylvania, USA)

TL;DR
This study applies machine learning techniques to classify DNA subsequences for restriction site analysis, improving DNA synthesis accuracy by identifying applicable sequences with high sensitivity.
Contribution
It introduces a pipeline combining feature selection and three ML models to classify DNA subsequences based on restriction site features, enhancing synthesis methods.
Findings
SVMs achieved 94.9% sensitivity
Random forest achieved 92.7% sensitivity
CNNs achieved 91.4% sensitivity
Abstract
Based on the BioBricks standard, restriction synthesis is a novel catabolic iterative DNA synthesis method that utilizes endonucleases to synthesize a query sequence from a reference sequence. In this work, the reference sequence is built from shorter subsequences by classifying them as applicable or inapplicable for the synthesis method using three different machine learning methods: Support Vector Machines (SVMs), random forest, and Convolution Neural Networks (CNNs). Before applying these methods to the data, a series of feature selection, curation, and reduction steps are applied to create an accurate and representative feature space. Following these preprocessing steps, three different pipelines are proposed to classify subsequences based on their nucleotide sequence and other relevant features corresponding to the restriction sites of over 200 endonucleases. The sensitivity using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution
