Statistical mechanics of transcription-factor binding site discovery using Hidden Markov Models
Pankaj Mehta, David Schwab, Anirvan M. Sengupta

TL;DR
This paper explores the use of Hidden Markov Models for identifying transcription factor binding sites, applying statistical mechanics to derive insights into learning efficiency and parameter confidence.
Contribution
It introduces a novel analytical framework linking HMMs for TF binding to statistical mechanics, providing formulas for Fisher information and a scaling law for learning from data.
Findings
Derived analytic expressions for Fisher information in low binding site density limit
Established a scaling principle relating TF specificity to training data requirements
Provided insights into confidence measures for learned HMM parameters
Abstract
Hidden Markov Models (HMMs) are a commonly used tool for inference of transcription factor (TF) binding sites from DNA sequence data. We exploit the mathematical equivalence between HMMs for TF binding and the "inverse" statistical mechanics of hard rods in a one-dimensional disordered potential to investigate learning in HMMs. We derive analytic expressions for the Fisher information, a commonly employed measure of confidence in learned parameters, in the biologically relevant limit where the density of binding sites is low. We then use techniques from statistical mechanics to derive a scaling principle relating the specificity (binding energy) of a TF to the minimum amount of training data necessary to learn it.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
