All Eyes on the Workflow: Automated and Efficient Event Discovery from Video Streams
Marco Pegoraro, Jonas Seng, Dustin Heller, Wil M.P. van der Aalst, Kristian Kersting

TL;DR
SnapLog is a novel method that converts video data into event logs by extracting features, segmenting frames, and classifying segments, enabling process mining on video streams.
Contribution
The paper introduces SnapLog, a new approach for extracting interpretable event data from videos using image embeddings and few-shot classification.
Findings
Accurately reflects process in videos
Produces interpretable event logs
Enables process mining on video data
Abstract
Disciplines such as business process management and process mining aid organizations by discovering insights about processes on the basis of recorded event data. However, an obstacle to process analysis is data multi-modality: for instance, data in video form are not directly interpretable as events. In this work, we present SnapLog, an approach to extract event data from videos by converting frames to feature vectors using image embeddings and performing temporal segmentation through frame-wise similarity matrices. A generalized few-shot classification is then used to assign labels to the video segments, yielding labeled, timestamped sub-sequences of frames that are interpretable as events. Conventional process mining techniques can be used to analyze the resulting data. We show that our approach produces logs that accurately reflect the process in the videos.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
