Attention Please: What Transformer Models Really Learn for Process Prediction
Martin K\"appel, Lars Ackermann, Stefan Jablonski, Simon H\"artl

TL;DR
This paper investigates whether attention scores in transformer models for process prediction can be used as explanations, proposing graph-based methods to interpret their decision-making and improve process modeling.
Contribution
It demonstrates that attention scores can serve as explanations in process prediction models and introduces graph-based methods for interpretability.
Findings
Attention scores can be used as explanations for predictions.
Graph-based explanation approaches improve interpretability.
Insights can enhance predictive process modeling and process mining.
Abstract
Predictive process monitoring aims to support the execution of a process during runtime with various predictions about the further evolution of a process instance. In the last years a plethora of deep learning architectures have been established as state-of-the-art for different prediction targets, among others the transformer architecture. The transformer architecture is equipped with a powerful attention mechanism, assigning attention scores to each input part that allows to prioritize most relevant information leading to more accurate and contextual output. However, deep learning models largely represent a black box, i.e., their reasoning or decision-making process cannot be understood in detail. This paper examines whether the attention scores of a transformer based next-activity prediction model can serve as an explanation for its decision-making. We find that attention scores in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems
MethodsSoftmax · Attention Is All You Need
