Estimate the Occurrence Rate of the DNA Palindromes
I-Ping Tu, Yuan-Fu Huang, Shao-Hsuan Wang

TL;DR
This paper introduces a new analytic method to accurately estimate the occurrence rate of DNA palindromes, accounting for hot spots, and improves robustness over traditional average rate estimators.
Contribution
It presents a novel formula for estimating DNA palindrome occurrence rates under a Markov model, enhancing accuracy and robustness against hot spots.
Findings
The new estimator outperforms the average rate in simulations.
The method effectively accounts for hot spot regions.
Analytical formulas enable precise p-value calculations.
Abstract
A DNA palindrome is a segment of double-stranded DNA sequence with inver- sion symmetry which may form secondary structures conferring significant biolog- ical functions ranging from RNA transcription to DNA replication. To test if the clusters of DNA palindromes distribute randomly is an interesting bioinformatic problem, where the occurrence rate of the DNA palindromes is a key estimator for setting up a test. The most commonly used statistics for estimating the occur- rence rate for scan statistics is the average rate. However, in our simulation, the average rate may double the null occurrence rate of DNA palindromes due to hot spot regions of 3000 bp's in a herpes virus genome. Here, we propose a formula to estimate the occurrence rate through an analytic derivation under a Markov assumption on DNA sequence. Our simulation study shows that the performance of this method has improved…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMolecular Biology Techniques and Applications · Gene expression and cancer classification · Genetic factors in colorectal cancer
