Sparse approaches for the exact distribution of patterns in long state sequences generated by a Markov source
Gr\'egory Nuel (MAP5), Jean-Guillaume Dumas (LJK)

TL;DR
This paper introduces two innovative sparse methods for precisely calculating the distribution of patterns in long Markov-generated sequences, with applications in biology demonstrating their effectiveness and extended feasibility.
Contribution
The paper presents two novel sparse algorithms for exact pattern distribution computation in long sequences, improving computational feasibility and accuracy.
Findings
Methods successfully applied to biological data sets
Algorithms demonstrate complementary strengths
Extended the domain of feasible exact computations
Abstract
We present two novel approaches for the computation of the exact distribution of a pattern in a long sequence. Both approaches take into account the sparse structure of the problem and are two-part algorithms. The first approach relies on a partial recursion after a fast computation of the second largest eigenvalue of the transition matrix of a Markov chain embedding. The second approach uses fast Taylor expansions of an exact bivariate rational reconstruction of the distribution. We illustrate the interest of both approaches on a simple toy-example and two biological applications: the transcription factors of the Human Chromosome 5 and the PROSITE signatures of functional motifs in proteins. On these example our methods demonstrate their complementarity and their hability to extend the domain of feasibility for exact computations in pattern problems to a new level.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
