Convergence rate of Markov chain methods for genomic motif discovery
Dawn B. Woodard, Jeffrey S. Rosenthal

TL;DR
This paper investigates the convergence rate of a Gibbs sampling method for genomic motif discovery, revealing exponential slowdown due to multimodality in biological data, and provides some of the first bounds on such convergence rates.
Contribution
It offers the first meaningful bounds on the convergence rate of Markov chain methods for multimodal posterior distributions in genomic data analysis.
Findings
Convergence rate decreases exponentially with sequence length in multimodal cases.
Empirical results show Gibbs sampler is mainly used for mode detection, not sampling.
Provides theoretical bounds linking convergence to data properties.
Abstract
We analyze the convergence rate of a simplified version of a popular Gibbs sampling method used for statistical discovery of gene regulatory binding motifs in DNA sequences. This sampler satisfies a very strong form of ergodicity (uniform). However, we show that, due to multimodality of the posterior distribution, the rate of convergence often decreases exponentially as a function of the length of the DNA sequence. Specifically, we show that this occurs whenever there is more than one true repeating pattern in the data. In practice there are typically multiple such patterns in biological data, the goal being to detect the most well-conserved and frequently-occurring of these. Our findings match empirical results, in which the motif-discovery Gibbs sampler has exhibited such poor convergence that it is used only for finding modes of the posterior distribution (candidate motifs) rather…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
