# Finding Good Itemsets by Packing Data

**Authors:** Nikolaj Tatti, Jilles Vreeken

arXiv: 1902.02392 · 2019-02-08

## TL;DR

This paper introduces a novel method for selecting representative itemsets by leveraging data compression with decision trees and MDL, capturing complex attribute interactions for more effective data summarization.

## Contribution

It presents a new approach combining decision trees and MDL to find compact, high-quality itemsets that model complex data interactions, along with two algorithms for itemset selection.

## Key findings

- Algorithms produce compact data descriptions
- Effective in capturing complex attribute interactions
- Results show high-quality, interpretable itemsets

## Abstract

The problem of selecting small groups of itemsets that represent the data well has recently gained a lot of attention. We approach the problem by searching for the itemsets that compress the data efficiently. As a compression technique we use decision trees combined with a refined version of MDL. More formally, assuming that the items are ordered, we create a decision tree for each item that may only depend on the previous items. Our approach allows us to find complex interactions between the attributes, not just co-occurrences of 1s. Further, we present a link between the itemsets and the decision trees and use this link to export the itemsets from the decision trees. In this paper we present two algorithms. The first one is a simple greedy approach that builds a family of itemsets directly from data. The second one, given a collection of candidate itemsets, selects a small subset of these itemsets. Our experiments show that these approaches result in compact and high quality descriptions of the data.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.02392/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/1902.02392/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/1902.02392/full.md

---
Source: https://tomesphere.com/paper/1902.02392