Hashing for Fast Pattern Set Selection

Maiju Karjalainen; Pauli Miettinen

arXiv:2507.08745·cs.DB·July 14, 2025

Hashing for Fast Pattern Set Selection

Maiju Karjalainen, Pauli Miettinen

PDF

TL;DR

This paper introduces a hashing-based method for efficiently selecting pattern sets in data mining, significantly reducing computation time while maintaining high-quality results, applicable to various data analysis tasks.

Contribution

It presents a novel bottom-k hashing approach for pattern set selection that is faster and nearly as effective as traditional greedy algorithms.

Findings

01

The hashing method is significantly faster than greedy algorithms.

02

The approach achieves comparable quality in pattern set selection.

03

Effective on both synthetic and real-world datasets.

Abstract

Pattern set mining, which is the task of finding a good set of patterns instead of all patterns, is a fundamental problem in data mining. Many different definitions of what constitutes a good set have been proposed in recent years. In this paper, we consider the reconstruction error as a proxy measure for the goodness of the set, and concentrate on the adjacent problem of how to find a good set efficiently. We propose a method based on bottom-k hashing for efficiently selecting the set and extend the method for the common case where the patterns might only appear in approximate form in the data. Our approach has applications in tiling databases, Boolean matrix factorization, and redescription mining, among others. We show that our hashing-based approach is significantly faster than the standard greedy algorithm while obtaining almost equally good results in both synthetic and real-world…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.