Approximate Integration of streaming data

Michel de Rougemont; Guillaume Vimont

arXiv:1709.04290·cs.SI·September 14, 2017

Approximate Integration of streaming data

Michel de Rougemont, Guillaume Vimont

PDF

TL;DR

This paper introduces a method using weighted reservoir sampling to approximate complex queries on streaming data, enabling community detection and stream integration without storing all data, demonstrated on social media streams.

Contribution

It presents a novel approach for approximate query processing and community detection in streaming data using reservoir sampling, applicable to social networks and data warehouses.

Findings

01

Effective approximation of OLAP queries on data streams.

02

Community detection aligns well with power-law graph models.

03

Approximate community correlation can be computed without storing all edges.

Abstract

We approximate analytic queries on streaming data with a weighted reservoir sampling. For a stream of tuples of a Datawarehouse we show how to approximate some OLAP queries. For a stream of graph edges from a Social Network, we approximate the communities as the large connected components of the edges in the reservoir. We show that for a model of random graphs which follow a power law degree distribution, the community detection algorithm is a good approximation. Given two streams of graph edges from two Sources, we define the {\em Community Correlation} as the fraction of the nodes in communities in both streams. Although we do not store the edges of the streams, we can approximate the Community Correlation and define the {\em Integration of two streams}. We illustrate this approach with Twitter streams, associated with TV programs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.