TL;DR
This paper introduces a probabilistic subsequence interleaving model for sequential pattern mining that efficiently identifies relevant, interpretable, and non-redundant patterns without relying on predefined encoding schemes.
Contribution
The paper proposes a novel probabilistic model and inference framework for sequential pattern mining that improves pattern relevance and interpretability over existing methods.
Findings
Effective in both synthetic and real datasets
Produces patterns with low redundancy and high interpretability
Comparable or superior to state-of-the-art algorithms
Abstract
Recent sequential pattern mining methods have used the minimum description length (MDL) principle to define an encoding scheme which describes an algorithm for mining the most compressing patterns in a database. We present a novel subsequence interleaving model based on a probabilistic model of the sequence database, which allows us to search for the most compressing set of patterns without designing a specific encoding scheme. Our proposed algorithm is able to efficiently mine the most relevant sequential patterns and rank them using an associated measure of interestingness. The efficient inference in our model is a direct result of our use of a structural expectation-maximization framework, in which the expectation-step takes the form of a submodular optimization problem subject to a coverage constraint. We show on both synthetic and real world datasets that our model mines a set of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsInterpretability
