Finding Frequent Entities in Continuous Data

Ferran Alet; Rohan Chitnis; Leslie P. Kaelbling; Tomas Lozano-Perez

arXiv:1805.02874·cs.AI·May 9, 2018

Finding Frequent Entities in Continuous Data

Ferran Alet, Rohan Chitnis, Leslie P. Kaelbling, Tomas Lozano-Perez

PDF

Open Access

TL;DR

This paper introduces a novel online algorithm called HAC for identifying frequent entities in high-dimensional continuous data, offering more accurate solutions than traditional clustering methods, with demonstrated effectiveness on real-world video and household data.

Contribution

The paper formalizes the heavy hitters approach for continuous data and presents HAC, a new online algorithm that improves detection accuracy over clustering-based methods.

Findings

01

HAC outperforms clustering methods in accuracy.

02

Effective on real video and household data.

03

Provides a more precise identification of frequent entities.

Abstract

In many applications that involve processing high-dimensional data, it is important to identify a small set of entities that account for a significant fraction of detections. Rather than formalize this as a clustering problem, in which all detections must be grouped into hard or soft categories, we formalize it as an instance of the frequent items or heavy hitters problem, which finds groups of tightly clustered objects that have a high density in the feature space. We show that the heavy hitters formulation generates solutions that are more accurate and effective than the clustering formulation. In addition, we present a novel online algorithm for heavy hitters, called HAC, which addresses problems in continuous space, and demonstrate its effectiveness on real video and household domains.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Advanced Clustering Algorithms Research · Video Analysis and Summarization