# Approximate Queries and Representations for Large Data Sequences

**Authors:** Hagit Shatkay, Stanley B. Zdonik

arXiv: 1904.09262 · 2019-04-22

## TL;DR

This paper introduces a novel approach for approximate representations of large data sequences, enabling efficient approximate queries in domains like medicine, using real-valued functions and a divide-and-conquer algorithm.

## Contribution

It presents a new generalized notion of approximate queries and a divide-and-conquer method using real-valued functions for data representation.

## Key findings

- Effective in reducing storage and search space
- Supports application-dependent approximate queries
- Applied successfully to medical cardiology data

## Abstract

Many new database application domains such as experimental sciences and medicine are characterized by large sequences as their main form of data. Using approximate representation can significantly reduce the required storage and search space. A good choice of representation, can support a broad new class of approximate queries, needed in these domains. These queries are concerned with application dependent features of the data as opposed to the actual sampled points. We introduce a new notion of generalized approximate queries and a general divide and conquer approach that supports them. This approach uses families of real-valued functions as an approximate representation. We present an algorithm for realizing our technique, and the results of applying it to medical cardiology data.   (Extended version is available in Tech Report CS-95-03, Dept of Computer Science, Brown University. http://cs.brown.edu/research/pubs/techreports/reports/CS-95-03.html)

---
Source: https://tomesphere.com/paper/1904.09262