Bayesian Centroid Estimation for Motif Discovery

Luis E. Carvalho

arXiv:1204.1571·stat.AP·June 4, 2015

Bayesian Centroid Estimation for Motif Discovery

Luis E. Carvalho

PDF

TL;DR

This paper introduces a Bayesian centroid estimator for motif discovery in biological sequences, improving the identification of binding sites by refining the inference process and providing better insights than traditional methods.

Contribution

It proposes a new centroid estimator based on a refined loss function, extending the Bayesian model used in the Gibbs motif sampler for more accurate motif and binding site detection.

Findings

01

Centroid estimator can differ from MAP estimator in motif discovery.

02

The proposed method offers computational advantages.

03

Validated on simulated and real datasets, showing improved inference.

Abstract

Biological sequences may contain patterns that are signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif discovery we are given a set of sequences that share a common motif and aim to identify not only the motif composition, but also the binding sites in each sequence of the set. We present a Bayesian model that is an extended version of the model adopted by the Gibbs motif sampler, and propose a new centroid estimator that arises from a refined and meaningful loss function for binding site inference. We discuss the main advantages of centroid estimation for motif discovery, including computational convenience, and how its principled derivation offers further insights about the posterior distribution of binding site configurations. We also illustrate,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.