# Significance of Episodes Based on Minimal Windows

**Authors:** Nikolaj Tatti

arXiv: 1902.02755 · 2019-02-08

## TL;DR

This paper introduces a new measure for evaluating the significance of episodes in sequence data based on minimal window lengths, and proposes an iterative method to compute their distribution for effective pruning.

## Contribution

It presents a novel significance measure for episodes and an iterative technique to compute minimal window length distributions under the independence model.

## Key findings

- Significant episodes can be effectively identified using the proposed measure.
- The method reduces the number of patterns by filtering out uninteresting episodes.
- Experimental results confirm the approach's ability to find meaningful episodes.

## Abstract

Discovering episodes, frequent sets of events from a sequence has been an active field in pattern mining. Traditionally, a level-wise approach is used to discover all frequent episodes. While this technique is computationally feasible it may result in a vast number of patterns, especially when low thresholds are used.   In this paper we propose a new quality measure for episodes. We say that an episode is significant if the average length of its minimal windows deviates greatly when compared to the expected length according to the independence model. We can apply this measure as a post-pruning step to test whether the discovered frequent episodes are truly interesting and consequently to reduce the number of output.   As a main contribution we introduce a technique that allows us to compute the distribution of lengths of minimal windows using the independence model. Such a computation task is surpisingly complex and in order to solve it we compute the distribution iteratively starting from simple episodes and progressively moving towards the more complex ones. In our experiments we discover candidate episodes that have a sufficient amount of minimal windows and test each candidate for significance. The experimental results demonstrate that our approach finds significant episodes while ignoring uninteresting ones.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.02755/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/1902.02755/full.md

## References

14 references — full list in the complete paper: https://tomesphere.com/paper/1902.02755/full.md

---
Source: https://tomesphere.com/paper/1902.02755