On distribution of runs and patterns in four state trials
Jungtaek Oh

TL;DR
This paper derives exact probability distributions for runs and patterns in sequences of four-state trials, with applications to DNA sequence analysis and statistical modeling of run lengths.
Contribution
It provides new exact formulas for run length distributions, longest and shortest run statistics, and waiting times in four-state sequences, advancing statistical understanding of multi-state trials.
Findings
Exact distribution formulas for runs of B's
Distribution of longest and shortest runs
Waiting time distribution for pattern occurrences
Abstract
From a mathematical and statistical point of view, a segment of a DNA strand can be viewed as a sequence of four-state (A, C, G, T) trials. We consider distributions of runs and patterns related to run lengths of multi-state sequences, especially for four states (A, B, C, D). Let be a sequence of four state i.i.d.\ trials taking values in the set of four symbols with probability , , and respectively. In this paper, we obtain exact formulae for the probability distribution function for runs of B's the discrete distribution of order , longest run statistics, shortest run statistics, waiting time distribution and the distribution of run lengths.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced biosensing and bioanalysis techniques · DNA and Biological Computing · DNA and Nucleic Acid Chemistry
