Learning quantitative sequence-function relationships from massively parallel experiments
Gurinder S. Atwal, Justin B. Kinney

TL;DR
This paper discusses how to infer models of biological sequence-function relationships from massively parallel experiments, emphasizing mutual information over likelihood and exploring the concept of diffeomorphic modes in parameter space.
Contribution
It extends theoretical understanding of model inference from experimental data, highlighting the importance of mutual information and introducing the concept of diffeomorphic modes.
Findings
Mutual information-based inference is often necessary for accurate parameter learning.
Diffeomorphic modes represent directions in parameter space that are less constrained by data.
An analytically tractable model demonstrates these inference principles.
Abstract
A fundamental aspect of biological information processing is the ubiquity of sequence-function relationships -- functions that map the sequence of DNA, RNA, or protein to a biochemically relevant activity. Most sequence-function relationships in biology are quantitative, but only recently have experimental techniques for effectively measuring these relationships been developed. The advent of such "massively parallel" experiments presents an exciting opportunity for the concepts and methods of statistical physics to inform the study of biological systems. After reviewing these recent experimental advances, we focus on the problem of how to infer parametric models of sequence-function relationships from the data produced by these experiments. Specifically, we retrace and extend recent theoretical work showing that inference based on mutual information, not the standard likelihood-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
