Longest Alignment with Edits in Data Streams
Elena Grigorescu, Erfan Sadeqi Azer, Samson Zhou

TL;DR
This paper develops algorithms for detecting the longest similar segments between two data streams under edit distance constraints, addressing challenges of limited storage and noisy data in real-time stream analysis.
Contribution
It introduces new algorithms, including an exact one-pass method, for identifying longest $d$-near-alignments in streaming data with sublinear space complexity.
Findings
Exact one-pass algorithm with $ ext{O}(d^2+d ext{log} n)$ space
Lower bounds matching algorithm performance
Effective in noisy, resource-constrained streaming environments
Abstract
Analyzing patterns in data streams generated by network traffic, sensor networks, or satellite feeds is a challenge for systems in which the available storage is limited. In addition, real data is noisy, which makes designing data stream algorithms even more challenging. Motivated by such challenges, we study algorithms for detecting the similarity of two data streams that can be read in sync. Two strings form a -near-alignment if the distance between them in some given metric is at most . We study the problem of identifying a longest substring of and that forms a -near-alignment under the edit distance, in the simultaneous streaming model. In this model, symbols of strings and are streamed at the same time, and the amount of available processing space is sublinear in the length of the strings. We give several algorithms, including an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Data Storage Technologies · DNA and Biological Computing
