# The content correlation of multiple streaming edges

**Authors:** Michel de Rougemont, Guillaume Vimont

arXiv: 1812.09867 · 2018-12-27

## TL;DR

This paper introduces an efficient online method to detect and analyze content correlations between multiple streaming graph edges, enabling real-time clustering and search applications like Twitter stream analysis.

## Contribution

It extends clustering and correlation detection to dynamic, multi-stream graphs without full storage, providing guarantees for power-law distributed random graphs.

## Key findings

- Effective online approximation of content correlation in streaming graphs
- Successful application to Twitter data streams
- Enables correlation-based search and explanation mechanisms

## Abstract

We study how to detect clusters in a graph defined by a stream of edges, without storing the entire graph. We extend the approach to dynamic graphs defined by the most recent edges of the stream and to several streams. The {\em content correlation }of two streams $\rho(t)$ is the Jaccard similarity of their clusters in the windows before time $t$. We propose a simple and efficient method to approximate this correlation online and show that for dynamic random graphs which follow a power law degree distribution, we can guarantee a good approximation. As an application, we follow Twitter streams and compute their content correlations online. We then propose a {\em search by correlation} where answers to sets of keywords are entirely based on the small correlations of the streams. Answers are ordered by the correlations, and explanations can be traced with the stored clusters.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.09867/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1812.09867/full.md

## References

13 references — full list in the complete paper: https://tomesphere.com/paper/1812.09867/full.md

---
Source: https://tomesphere.com/paper/1812.09867