# Finding the needle in the haystack—An interpretable sequential pattern mining method for classification problems

**Authors:** Alexander Grote, Anuja Hariharan, Christof Weinhardt

PMC · DOI: 10.3389/fdata.2025.1604887 · Frontiers in Big Data · 2025-10-24

## TL;DR

This paper introduces a new method for finding meaningful sequences in data like customer behavior or malware logs, making it easier to understand and act on the results.

## Contribution

A novel feature selection algorithm that integrates unsupervised sequential pattern mining with supervised learning, using a class-specific interestingness measure.

## Key findings

- The algorithm achieved classification performance comparable to existing methods.
- It reduced computational costs while maintaining interpretability.
- It was tested on diverse datasets including churn prediction and malware analysis.

## Abstract

The analysis of discrete sequential data, such as event logs and customer clickstreams, is often challenged by the vast number of possible sequential patterns. This complexity makes it difficult to identify meaningful sequences and derive actionable insights.

We propose a novel feature selection algorithm, that integrates unsupervised sequential pattern mining with supervised machine learning. Unlike existing interpretable machine learning methods, we determine important sequential patterns during the mining process, eliminating the need for post-hoc classification to assess their relevance. Compared to existing interesting measures, we introduce a local, class-specific interestingness measure that is inherently interpretable.

We evaluated the algorithm on three diverse datasets - churn prediction, malware sequence analysis, and a synthetic dataset - covering different sizes, application domains, and feature complexities. Our method achieved classification performance comparable to established feature selection algorithms while maintaining interpretability and reducing computational costs.

This study demonstrates a practical and efficient approach for uncovering important sequential patterns in classification tasks. By combining interpretability with competitive predictive performance, our algorithm provides practitioners with an interpretable and efficient alternative to existing methods, paving the way for new advances in sequential data analysis.

## Full-text entities

- **Diseases:** SPM (MESH:C536309)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12604564/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12604564/full.md

## References

109 references — full list in the complete paper: https://tomesphere.com/paper/PMC12604564/full.md

---
Source: https://tomesphere.com/paper/PMC12604564