Fast Counting in Machine Learning Applications

Subhadeep Karan; Matthew Eichhorn; Blake Hurlburt; Grant Iraci,; Jaroslaw Zola

arXiv:1804.04640·stat.ML·January 9, 2019

Fast Counting in Machine Learning Applications

Subhadeep Karan, Matthew Eichhorn, Blake Hurlburt, Grant Iraci,, Jaroslaw Zola

PDF

Open Access 4 Repos

TL;DR

This paper introduces scalable, memory-efficient methods for counting queries in machine learning, outperforming traditional data structures and enabling large-scale data processing.

Contribution

It presents a novel streaming aggregation approach for counting queries that improves efficiency and scalability in machine learning tasks.

Findings

01

Outperforms ADtrees and hash tables in speed and memory usage.

02

Effective in Bayesian network learning and association rule mining.

03

Demonstrates scalability on large datasets.

Abstract

We propose scalable methods to execute counting queries in machine learning applications. To achieve memory and computational efficiency, we abstract counting queries and their context such that the counts can be aggregated as a stream. We demonstrate performance and scalability of the resulting approach on random queries, and through extensive experimentation using Bayesian networks learning and association rule mining. Our methods significantly outperform commonly used ADtrees and hash tables, and are practical alternatives for processing large-scale data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Data Mining Algorithms and Applications · Data Management and Algorithms