Extracting Replayable Interactions from Videos of Mobile App Usage
Jieshan Chen, Amanda Swearngin, Jason Wu, Titus Barik, Jeffrey, Nichols, Xiaoyi Zhang

TL;DR
This paper presents a novel pixel-based method to extract and replay user interactions from mobile app videos, enabling better reproduction and analysis of touch interactions across devices.
Contribution
It introduces an end-to-end approach combining image processing and deep learning to identify, classify, and locate interactions in videos using only pixel data.
Findings
Successfully replayed 84.1% of interactions on iOS
Replayed 78.4% of interactions on Android
Demonstrated applicability across 64 apps and two platforms
Abstract
Screen recordings of mobile apps are a popular and readily available way for users to share how they interact with apps, such as in online tutorial videos, user reviews, or as attachments in bug reports. Unfortunately, both people and systems can find it difficult to reproduce touch-driven interactions from video pixel data alone. In this paper, we introduce an approach to extract and replay user interactions in videos of mobile apps, using only pixel information in video frames. To identify interactions, we apply heuristic-based image processing and convolutional deep learning to segment screen recordings, classify the interaction in each segment, and locate the interaction point. To replay interactions on another device, we match elements on app screens using UI element detection. We evaluate the feasibility of our pixel-based approach using two datasets: the Rico mobile app dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimedia Communication and Technology · Web Data Mining and Analysis · Online Learning and Analytics
