The Capacity of Associated Subsequence Retrieval

Behrooz Tahmasebi; Mohammad Ali Maddah-Ali; Seyed Abolfazl Motahari

arXiv:1808.03708·cs.IT·October 15, 2020

The Capacity of Associated Subsequence Retrieval

Behrooz Tahmasebi, Mohammad Ali Maddah-Ali, Seyed Abolfazl Motahari

PDF

TL;DR

This paper introduces an information-theoretic framework for associated subsequence retrieval in genomic data, establishing the capacity and thresholds for accurately identifying relevant subsequences linked to observable traits.

Contribution

It formulates the associated subsequence retrieval problem, derives its capacity, and provides achievable schemes and converses for zero-error and epsilon-error scenarios.

Findings

01

Threshold effect in error probability versus rate curve.

02

Capacity characterized for zero-error and epsilon-error cases.

03

Achievable schemes and converses match, establishing optimality.

Abstract

The objective of a genome-wide association study (GWAS) is to associate subsequences of individuals' genomes to the observable characteristics called phenotypes (e.g., high blood pressure). Motivated by the GWAS problem, in this paper we introduce the information-theoretic problem of \emph{associated subsequence retrieval}, where a dataset of $N$ (possibly high-dimensional) sequences of length $G$ , and their corresponding observable (binary) characteristics is given. The sequences are chosen independently and uniformly at random from $X^{G}$ , where $X$ is a finite alphabet. The observable (binary) characteristic is only related to a specific unknown subsequence of length $L$ of the sequences, called \textit{associated subsequence}. For each sequence, if the associated subsequence of it belongs to a universal finite set, then it is more likely to display the observable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.