Estimating Range Queries using Aggregate Data with Integrity Constraints: a Probabilistic Approach
Francesco Buccafurri, Filippo Furfaro, Domenico Sacca'

TL;DR
This paper presents a probabilistic method for estimating range queries over multidimensional data using only aggregate summaries, without needing access to the original data, ensuring data integrity and privacy.
Contribution
It introduces a novel probabilistic framework for accurately estimating range queries solely from aggregate data, avoiding assumptions about the original dataset.
Findings
Provides a theoretical model for query estimation
Achieves accurate estimates using only aggregate summaries
Ensures data integrity constraints are maintained
Abstract
The problem of recovering (count and sum) range queries over multidimensional data only on the basis of aggregate information on such data is addressed. This problem can be formalized as follows. Suppose that a transformation T producing a summary from a multidimensional data set is used. Now, given a data set D, a summary S=T(D) and a range query r on D, the problem consists of studying r by modelling it as a random variable defined over the sample space of all the data sets D' such that T(D) = S. The study of such a random variable, done by the definition of its probability distribution and the computation of its mean value and variance, represents a well-founded, theoretical probabilistic approach for estimating the query only on the basis of the available information (that is the summary S) without assumptions on original data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Data Mining Algorithms and Applications
