On Finding Frequent Patterns in Directed Acyclic Graphs
Andrea Campagna, Rasmus Pagh

TL;DR
This paper introduces efficient algorithms for discovering the most common label sequences in directed acyclic graphs, with applications to analyzing large RFID datasets for passenger movement patterns.
Contribution
It presents novel algorithms with complexity depending only on graph size and trace frequency, and applies streaming techniques for space efficiency, addressing a practical large-scale data analysis problem.
Findings
Algorithms efficiently identify frequent traces in large DAGs.
Experimental results demonstrate effectiveness on RFID baggage data.
Space usage depends only on frequency threshold, not trace count.
Abstract
Given a directed acyclic graph with labeled vertices, we consider the problem of finding the most common label sequences ("traces") among all paths in the graph (of some maximum length m). Since the number of paths can be huge, we propose novel algorithms whose time complexity depends only on the size of the graph, and on the relative frequency epsilon of the most frequent traces. In addition, we apply techniques from streaming algorithms to achieve space usage that depends only on epsilon, and not on the number of distinct traces. The abstract problem considered models a variety of tasks concerning finding frequent patterns in event sequences. Our motivation comes from working with a data set of 2 million RFID readings from baggage trolleys at Copenhagen Airport. The question of finding frequent passenger movement patterns is mapped to the above problem. We report on experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Data Management and Algorithms · Algorithms and Data Compression
