MDL-based Compressing Sequential Rules

Xinhong Chen; Wensheng Gan; Shicheng Wan; and Tianlong Gu

arXiv:2212.10252·cs.AI·December 21, 2022

MDL-based Compressing Sequential Rules

Xinhong Chen, Wensheng Gan, Shicheng Wan, and Tianlong Gu

PDF

Open Access

TL;DR

This paper introduces ComSR, a novel MDL-based method for compressing sequential rules in data mining, which effectively reduces data size by encoding entire databases with meaningful rules.

Contribution

First to apply MDL principle to compress sequential rules in databases, proposing a new coding scheme and heuristic algorithms for effective compression.

Findings

01

ComSR can find compact, meaningful rule sets.

02

The method achieves significant data compression.

03

Experiments validate the effectiveness of the approach.

Abstract

Nowadays, with the rapid development of the Internet, the era of big data has come. The Internet generates huge amounts of data every day. However, extracting meaningful information from massive data is like looking for a needle in a haystack. Data mining techniques can provide various feasible methods to solve this problem. At present, many sequential rule mining (SRM) algorithms are presented to find sequential rules in databases with sequential characteristics. These rules help people extract a lot of meaningful information from massive amounts of data. How can we achieve compression of mined results and reduce data size to save storage space and transmission time? Until now, there has been little research on the compression of SRM. In this paper, combined with the Minimum Description Length (MDL) principle and under the two metrics (support and confidence), we introduce the problem…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Rough Sets and Fuzzy Logic · Data Management and Algorithms

Methodsstyle-based recalibration module