Sharingan: Extract User Action Sequence from Desktop Recordings
Yanting Chen, Yi Ren, Xiaoting Qin, Jue Zhang, Kehong Yuan, Lu Han,, Qingwei Lin, Dongmei Zhang, Saravan Rajmohan, Qi Zhang

TL;DR
This paper introduces two novel Vision-Language Model-based methods for extracting user action sequences from desktop recordings, demonstrating promising accuracy and providing new benchmarks and insights for automating user behavior analysis.
Contribution
It presents the first application of VLMs for desktop user action extraction, proposing two methods and evaluating their effectiveness on curated datasets.
Findings
DF approach achieves 70-80% accuracy
Explicit UI change detection can reduce performance
Methods enable re-playable action sequences for RPA
Abstract
Video recordings of user activities, particularly desktop recordings, offer a rich source of data for understanding user behaviors and automating processes. However, despite advancements in Vision-Language Models (VLMs) and their increasing use in video analysis, extracting user actions from desktop recordings remains an underexplored area. This paper addresses this gap by proposing two novel VLM-based methods for user action extraction: the Direct Frame-Based Approach (DF), which inputs sampled frames directly into VLMs, and the Differential Frame-Based Approach (DiffF), which incorporates explicit frame differences detected via computer vision techniques. We evaluate these methods using a basic self-curated dataset and an advanced benchmark adapted from prior work. Our results show that the DF approach achieves an accuracy of 70% to 80% in identifying user actions, with the extracted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPersonal Information Management and User Behavior · Data Visualization and Analytics
