Statistical mechanics of transcription-factor binding site discovery   using Hidden Markov Models

Pankaj Mehta; David Schwab; Anirvan M. Sengupta

arXiv:1008.3151·cond-mat.stat-mech·May 19, 2015

Statistical mechanics of transcription-factor binding site discovery using Hidden Markov Models

Pankaj Mehta, David Schwab, Anirvan M. Sengupta

PDF

TL;DR

This paper explores the use of Hidden Markov Models for identifying transcription factor binding sites, applying statistical mechanics to derive insights into learning efficiency and parameter confidence.

Contribution

It introduces a novel analytical framework linking HMMs for TF binding to statistical mechanics, providing formulas for Fisher information and a scaling law for learning from data.

Findings

01

Derived analytic expressions for Fisher information in low binding site density limit

02

Established a scaling principle relating TF specificity to training data requirements

03

Provided insights into confidence measures for learned HMM parameters

Abstract

Hidden Markov Models (HMMs) are a commonly used tool for inference of transcription factor (TF) binding sites from DNA sequence data. We exploit the mathematical equivalence between HMMs for TF binding and the "inverse" statistical mechanics of hard rods in a one-dimensional disordered potential to investigate learning in HMMs. We derive analytic expressions for the Fisher information, a commonly employed measure of confidence in learned parameters, in the biologically relevant limit where the density of binding sites is low. We then use techniques from statistical mechanics to derive a scaling principle relating the specificity (binding energy) of a TF to the minimum amount of training data necessary to learn it.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.