Approximate Queries and Representations for Large Data Sequences

Hagit Shatkay; Stanley B. Zdonik

arXiv:1904.09262·cs.DB·April 22, 2019

Approximate Queries and Representations for Large Data Sequences

Hagit Shatkay, Stanley B. Zdonik

PDF

TL;DR

This paper introduces a novel approach for approximate representations of large data sequences, enabling efficient approximate queries in domains like medicine, using real-valued functions and a divide-and-conquer algorithm.

Contribution

It presents a new generalized notion of approximate queries and a divide-and-conquer method using real-valued functions for data representation.

Findings

01

Effective in reducing storage and search space

02

Supports application-dependent approximate queries

03

Applied successfully to medical cardiology data

Abstract

Many new database application domains such as experimental sciences and medicine are characterized by large sequences as their main form of data. Using approximate representation can significantly reduce the required storage and search space. A good choice of representation, can support a broad new class of approximate queries, needed in these domains. These queries are concerned with application dependent features of the data as opposed to the actual sampled points. We introduce a new notion of generalized approximate queries and a general divide and conquer approach that supports them. This approach uses families of real-valued functions as an approximate representation. We present an algorithm for realizing our technique, and the results of applying it to medical cardiology data. (Extended version is available in Tech Report CS-95-03, Dept of Computer Science, Brown University.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.