Mining Top-k Sequential Patterns in Database Graphs:A New Challenging Problem and a Sampling-based Approach
Mingtao Lei, Lingyang Chu, Zhefeng Wang

TL;DR
This paper introduces the concept of database graphs for modeling complex networks with transaction data at vertices, and proposes a sampling-based method to efficiently find the top-k frequent sequential patterns despite the problem's computational hardness.
Contribution
It defines database graphs for modeling networks with vertex-associated transaction data and presents a novel sampling algorithm to approximate top-k sequential patterns efficiently.
Findings
Sampling algorithm achieves high accuracy in pattern detection.
Method demonstrates efficiency on synthetic and real datasets.
Problem proven to be #P-hard, justifying the need for approximation.
Abstract
In many real world networks, a vertex is usually associated with a transaction database that comprehensively describes the behaviour of the vertex. A typical example is the social network, where the behaviour of every user is depicted by a transaction database that stores his daily posted contents. A transaction database is a set of transactions, where a transaction is a set of items. Every path of the network is a sequence of vertices that induces multiple sequences of transactions. The sequences of transactions induced by all of the paths in the network forms an extremely large sequence database. Finding frequent sequential patterns from such sequence database discovers interesting subsequences that frequently appear in many paths of the network. However, it is a challenging task, since the sequence database induced by a database graph is too large to be explicitly induced and stored.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Data Management and Algorithms · Rough Sets and Fuzzy Logic
