Empirical Null Estimation using Discrete Mixture Distributions and its Application to Protein Domain Data
Iris Ivy Gauran, Junyong Park, Johan Lim, DoHwan Park, John Zylstra,, Thomas Peterson, Maricel Kann, John Spouge

TL;DR
This paper introduces data-driven methods for estimating the empirical null distribution in mutation studies using discrete mixture models, enabling more accurate identification of significant mutations in protein domain data.
Contribution
It proposes novel procedures for empirical null estimation with discrete mixtures and applies them to large-scale mutation inference in protein domains.
Findings
Effective cut-off determination methods for null distribution.
Improved mutation significance detection in protein data.
Validation on simulated and real datasets.
Abstract
In recent mutation studies, analyses based on protein domain positions are gaining popularity over gene-centric approaches since the latter have limitations in considering the functional context that the position of the mutation provides. This presents a large-scale simultaneous inference problem, with hundreds of hypothesis tests to consider at the same time. This paper aims to select significant mutation counts while controlling a given level of Type I error via False Discovery Rate (FDR) procedures. One main assumption is that there exists a cut-off value such that smaller counts than this value are generated from the null distribution. We present several data-dependent methods to determine the cut-off value. We also consider a two-stage procedure based on screening process so that the number of mutations exceeding a certain value should be considered as significant mutations.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Molecular Biology Techniques and Applications · Genetic Associations and Epidemiology
