Longest Alignment with Edits in Data Streams

Elena Grigorescu; Erfan Sadeqi Azer; Samson Zhou

arXiv:1711.04367·cs.DS·November 15, 2017

Longest Alignment with Edits in Data Streams

Elena Grigorescu, Erfan Sadeqi Azer, Samson Zhou

PDF

Open Access

TL;DR

This paper develops algorithms for detecting the longest similar segments between two data streams under edit distance constraints, addressing challenges of limited storage and noisy data in real-time stream analysis.

Contribution

It introduces new algorithms, including an exact one-pass method, for identifying longest $d$-near-alignments in streaming data with sublinear space complexity.

Findings

01

Exact one-pass algorithm with $ ext{O}(d^2+d ext{log} n)$ space

02

Lower bounds matching algorithm performance

03

Effective in noisy, resource-constrained streaming environments

Abstract

Analyzing patterns in data streams generated by network traffic, sensor networks, or satellite feeds is a challenge for systems in which the available storage is limited. In addition, real data is noisy, which makes designing data stream algorithms even more challenging. Motivated by such challenges, we study algorithms for detecting the similarity of two data streams that can be read in sync. Two strings $S, T \in Σ^{n}$ form a $d$ -near-alignment if the distance between them in some given metric is at most $d$ . We study the problem of identifying a longest substring of $S$ and $T$ that forms a $d$ -near-alignment under the edit distance, in the simultaneous streaming model. In this model, symbols of strings $S$ and $T$ are streamed at the same time, and the amount of available processing space is sublinear in the length of the strings. We give several algorithms, including an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Advanced Data Storage Technologies · DNA and Biological Computing