Parameter Estimation in multiple-hidden i.i.d. models from biological multiple alignment
Ana Arribas-Gil

TL;DR
This paper develops a formal framework for parameter estimation in multiple-hidden i.i.d. models derived from biological multiple sequence alignments, incorporating phylogenetic relationships and indel evolution.
Contribution
It introduces a rigorous formalism for homology structures in multiple alignments and extends the model to arbitrary phylogenetic trees, with theoretical guarantees for parameter consistency.
Findings
Establishment of divergence properties under certain assumptions
Formalism for homology structure in multiple alignments
Simulation results illustrating model cases
Abstract
In this work we deal with parameter estimation in a latent variable model, namely the multiple-hidden i.i.d. model, which is derived from multiple alignment algorithms. We first provide a rigorous formalism for the homology structure of k sequences related by a star-shaped phylogenetic tree in the context of multiple alignment based on indel evolution models. We discuss possible definitions of likelihoods and compare them to the criterion used in multiple alignment algorithms. Existence of two different Information divergence rates is established and a divergence property is shown under additional assumptions. This would yield consistency for the parameter in parametrization schemes for which the divergence property holds. We finally extend the definition of the multiple-hidden i.i.d. model and the results obtained to the case in which the sequences are related by an arbitrary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Genetic diversity and population structure · Morphological variations and asymmetry
