Estimating seed sensitivity on homogeneous alignments

Gregory Kucherov (LIFL); Laurent Noe (LIFL); Yann Ponty (LRI)

arXiv:cs/0603106·cs.OH·January 18, 2011

Estimating seed sensitivity on homogeneous alignments

Gregory Kucherov (LIFL), Laurent Noe (LIFL), Yann Ponty (LRI)

PDF

TL;DR

This paper introduces methods for accurately estimating seed sensitivity in similarity search algorithms by focusing on homogeneous alignments, offering algorithms for counting, generating, and exact sensitivity computation, and highlighting biases from previous models.

Contribution

It presents novel algorithms for counting, generating, and exactly computing seed sensitivity based on homogeneous alignments, improving over Markov model approaches.

Findings

01

Homogeneous alignments significantly impact sensitivity estimates.

02

The proposed algorithms enable precise sensitivity calculations.

03

Ignoring homogeneousness introduces bias in sensitivity estimation.

Abstract

We address the problem of estimating the sensitivity of seed-based similarity search algorithms. In contrast to approaches based on Markov models [18, 6, 3, 4, 10], we study the estimation based on homogeneous alignments. We describe an algorithm for counting and random generation of those alignments and an algorithm for exact computation of the sensitivity for a broad class of seed strategies. We provide experimental results demonstrating a bias introduced by ignoring the homogeneousness condition.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.