Sequential learning theory for Markov genealogy processes

David J Pascall

arXiv:2603.09033·q-bio.QM·April 7, 2026

Sequential learning theory for Markov genealogy processes

David J Pascall

PDF

TL;DR

This paper develops a filtration-based framework for understanding how adding taxa influences phylodynamic inference, revealing fundamental limits of sequence data in uncovering latent genealogies.

Contribution

It introduces a new theoretical framework that decomposes variance reduction in taxa addition and classifies estimands based on their learning behavior.

Findings

01

Decomposition of variance reduction into learning, mismatch, and covariance components.

02

Classification of estimands into learning classes based on mismatch behavior.

03

Demonstration of fundamental limits in sequence data's ability to reveal latent genealogies.

Abstract

We introduce a filtration-based framework for studying when and why adding taxa improves phylodynamic inference, by constructing a natural ordering of observed tips and applying sequential Bayesian analysis to the resulting filtration. We decompose the expected variance reduction on taxa addition into learning, mismatch, and covariance components, classify estimands into learning classes based on the pathwise behaviour of the mismatch, and show that for absorbing estimands an oracle who knows the latent absorption status obtains event-wise learning guarantees unavailable to the analyst. The gap between oracle and analyst is irreducible assumptions that are likely to hold for many real phylodynamic estimands, establishing a fundamental limit on what sequence data alone can reveal about the latent genealogy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.