# The Online Event-Detection Problem

**Authors:** Michael A. Bender, Jonathan W. Berry, Martin Farach-Colton, Rob, Johnson, Thomas M. Kroeger, Prashant Pandey, Cynthia A. Phillips, Shikha, Singh

arXiv: 1812.09824 · 2018-12-27

## TL;DR

This paper introduces the online event-detection problem (OEDP), focusing on real-time identification of specific frequency-based events in data streams with strict accuracy and timeliness requirements, and proposes cache-efficient algorithms for it.

## Contribution

The paper formulates the OEDP, proves its space complexity, and develops near-optimal cache-efficient algorithms with tunable parameters for false positives and reporting delay.

## Key findings

- Algorithms are within a log factor of optimal in external memory.
- The algorithms can be tuned for bounded false positives and delays.
-  Improved performance when input follows a power-law distribution.

## Abstract

Given a stream $S = (s_1, s_2, ..., s_N)$, a $\phi$-heavy hitter is an item $s_i$ that occurs at least $\phi N$ times in $S$. The problem of finding heavy-hitters has been extensively studied in the database literature. In this paper, we study a related problem. We say that there is a $\phi$-event at time $t$ if $s_t$ occurs exactly $\phi N$ times in $(s_1, s_2, ..., s_t)$. Thus, for each $\phi$-heavy hitter there is a single $\phi$-event which occurs when its count reaches the reporting threshold $\phi N$. We define the online event-detection problem (OEDP) as: given $\phi$ and a stream $S$, report all $\phi$-events as soon as they occur.   Many real-world monitoring systems demand event detection where all events must be reported (no false negatives), in a timely manner, with no non-events reported (no false positives), and a low reporting threshold. As a result, the OEDP requires a large amount of space (Omega(N) words) and is not solvable in the streaming model or via standard sampling-based approaches.   Since OEDP requires large space, we focus on cache-efficient algorithms in the external-memory model.   We provide algorithms for the OEDP that are within a log factor of optimal. Our algorithms are tunable: its parameters can be set to allow for a bounded false-positives and a bounded delay in reporting. None of our relaxations allow false negatives since reporting all events is a strict requirement of our applications. Finally, we show improved results when the count of items in the input stream follows a power-law distribution.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.09824/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1812.09824/full.md

## References

51 references — full list in the complete paper: https://tomesphere.com/paper/1812.09824/full.md

---
Source: https://tomesphere.com/paper/1812.09824