Explaining Differences in Classes of Discrete Sequences
Samaneh Saadat, Gita Sukthankar

TL;DR
This paper introduces methods to interpret and explain differences between classes of discrete sequences, aiding understanding of human behavior and sequence classification models.
Contribution
The paper presents novel techniques for analyzing and interpreting differences between sequence classes, enhancing explainability of sequence classification models.
Findings
Silhouette score comparison of k-gram representations reveals class differences.
Distance matrix analysis characterizes key differences between sequence groups.
Applied methods successfully distinguished bot and non-bot GitHub team sequences.
Abstract
While there are many machine learning methods to classify and cluster sequences, they fail to explain what are the differences in groups of sequences that make them distinguishable. Although in some cases having a black box model is sufficient, there is a need for increased explainability in research areas focused on human behaviors. For example, psychologists are less interested in having a model that predicts human behavior with high accuracy and more concerned with identifying differences between actions that lead to divergent human behavior. This paper presents techniques for understanding differences between classes of discrete sequences. Approaches introduced in this paper can be utilized to interpret black box machine learning models on sequences. The first approach compares k-gram representations of sequences using the silhouette score. The second method characterizes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
