On Weight Matrix and Free Energy Models for Sequence Motif Detection
Qing Zhou

TL;DR
This paper compares weight matrix and free energy models for sequence motif detection, providing theoretical error rate analysis and empirical validation, showing free energy models often outperform weight matrix models in predictive power.
Contribution
It offers the first theoretical asymptotic error rate analysis of WM and FE models, clarifying their efficiency in motif detection tasks.
Findings
FE models have higher or comparable accuracy than WM models in most scenarios.
Theoretical analysis supports empirical results on ChIP-seq and microarray data.
Performance depends on the number of observed binding sites used for prediction.
Abstract
The problem of motif detection can be formulated as the construction of a discriminant function to separate sequences of a specific pattern from background. In computational biology, motif detection is used to predict DNA binding sites of a transcription factor (TF), mostly based on the weight matrix (WM) model or the Gibbs free energy (FE) model. However, despite the wide applications, theoretical analysis of these two models and their predictions is still lacking. We derive asymptotic error rates of prediction procedures based on these models under different data generation assumptions. This allows a theoretical comparison between the WM-based and the FE-based predictions in terms of asymptotic efficiency. Applications of the theoretical results are demonstrated with empirical studies on ChIP-seq data and protein binding microarray data. We find that, irrespective of underlying data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
