Parameter identifiability for a profile mixture model of protein evolution
Samaneh Yourdkhani, Elizabeth S. Allman, John A. Rhodes

TL;DR
This paper proves that a profile mixture model of protein evolution has identifiable parameters for empirical analyses, ensuring reliable inference of evolutionary relationships when certain conditions are met.
Contribution
It establishes the conditions under which the parameters of a profile mixture model are identifiable, specifically for trees with 9 or more taxa and fewer than 74 profiles.
Findings
Parameters are identifiable for trees with ≥9 taxa.
Identifiability holds when the number of profiles is less than 74.
Ensures model-based inference is justified in empirical studies.
Abstract
A Profile Mixture Model is a model of protein evolution, describing sequence data in which sites are assumed to follow many related substitution processes on a single evolutionary tree. The processes depend in part on different amino acid distributions, or profiles, varying over sites in aligned sequences. A fundamental question for any stochastic model, which must be answered positively to justify model-based inference, is whether the parameters are identifiable from the probability distribution they determine. Here we show that a Profile Mixture Model has identifiable parameters under circumstances in which it is likely to be used for empirical analyses. In particular, for a tree relating 9 or more taxa, both the tree topology and all numerical parameters are generically identifiable when the number of profiles is less than 74.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
