The Long and the Short of It: Summarising Event Sequences with Serial Episodes
Nikolaj Tatti, Jilles Vreeken

TL;DR
This paper introduces a pattern set mining approach using the MDL principle to effectively summarize sequential data with minimal redundancy, outperforming traditional pattern mining methods.
Contribution
It formalizes encoding sequential data with serial episodes and proposes two algorithms for mining small, informative pattern sets based on data compression.
Findings
Efficiently discovers small, informative pattern sets
Demonstrates effectiveness on synthetic and real datasets
Outperforms traditional pattern mining in reducing redundancy
Abstract
An ideal outcome of pattern mining is a small set of informative patterns, containing no redundancy or noise, that identifies the key structure of the data at hand. Standard frequent pattern miners do not achieve this goal, as due to the pattern explosion typically very large numbers of highly redundant patterns are returned. We pursue the ideal for sequential data, by employing a pattern set mining approach-an approach where, instead of ranking patterns individually, we consider results as a whole. Pattern set mining has been successfully applied to transactional data, but has been surprisingly under studied for sequential data. In this paper, we employ the MDL principle to identify the set of sequential patterns that summarises the data best. In particular, we formalise how to encode sequential data using sets of serial episodes, and use the encoded length as a quality score. As…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Algorithms and Data Compression · Advanced Database Systems and Queries
MethodsMinimum Description Length
