Identifying short motifs by means of extreme value analysis

Daniela Bianchi; Brunello Tirozzi

arXiv:0803.4277·q-bio.GN·November 13, 2009

Identifying short motifs by means of extreme value analysis

Daniela Bianchi, Brunello Tirozzi

PDF

TL;DR

This paper introduces a statistical method based on extreme value analysis to accurately detect short DNA binding sites, reducing false positives compared to existing methods.

Contribution

It presents a novel self-consistent statistical procedure that accounts for large deviations in matching probabilities for short motifs.

Findings

01

Successfully identified transcription factor binding sites in gene sets

02

Reproduced experimental findings where available

03

Significantly reduced false positives compared to other methods

Abstract

The problem of detecting a binding site -- a substring of DNA where transcription factors attach -- on a long DNA sequence requires the recognition of a small pattern in a large background. For short binding sites, the matching probability can display large fluctuations from one putative binding site to another. Here we use a self-consistent statistical procedure that accounts correctly for the large deviations of the matching probability to predict the location of short binding sites. We apply it in two distinct situations: (a) the detection of the binding sites for three specific transcription factors on a set of 134 estrogen-regulated genes; (b) the identification, in a set of 138 possible transcription factors, of the ones binding a specific set of nine genes. In both instances, experimental findings are reproduced (when available) and the number of false positives is significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.