Transcription Factor-DNA Binding Via Machine Learning Ensembles
Yue Fan, Mark Kon, Charles DeLisi

TL;DR
This paper introduces an ensemble machine learning approach that combines multiple motif discovery algorithms to improve transcription factor target gene prediction, binding site identification, and motif discovery across species.
Contribution
It presents a novel ensemble framework that integrates diverse PWM-based subspaces and machine learning classifiers for enhanced TF-DNA binding analysis.
Findings
Improved gene target prediction accuracy by about 10 percentage points.
Matched top motif discovery performance with minimal human intervention.
Enhanced binding site identification on cross-species and mammalian datasets.
Abstract
We present ensemble methods in a machine learning (ML) framework combining predictions from five known motif/binding site exploration algorithms. For a given TF the ensemble starts with position weight matrices (PWM's) for the motif, collected from the component algorithms. Using dimension reduction, we identify significant PWM-based subspaces for analysis. Within each subspace a machine classifier is built for identifying the TF's gene (promoter) targets (Problem 1). These PWM-based subspaces form an ML-based sequence analysis tool. Problem 2 (finding binding motifs) is solved by agglomerating k-mer (string) feature PWM-based subspaces that stand out in identifying gene targets. We approach Problem 3 (binding sites) with a novel machine learning approach that uses promoter string features and ML importance scores in a classification algorithm locating binding sites across the genome.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Chromatin Dynamics · Gene expression and cancer classification · RNA and protein synthesis mechanisms
