Video2Action: Reducing Human Interactions in Action Annotation of App Tutorial Videos
Sidong Feng, Chunyang Chen, Zhenchang Xing

TL;DR
Video2Action is a lightweight method that automatically detects actions and their locations in tutorial videos, reducing manual effort and aiding creators in annotating app tutorial videos effectively.
Contribution
The paper introduces Video2Action, a novel automated approach combining image processing and deep learning to streamline action annotation in tutorial videos.
Findings
High accuracy in action detection and localization
User study confirms usefulness in aiding video creators
Automated method reduces annotation time
Abstract
Tutorial videos of mobile apps have become a popular and compelling way for users to learn unfamiliar app features. To make the video accessible to the users, video creators always need to annotate the actions in the video, including what actions are performed and where to tap. However, this process can be time-consuming and labor-intensive. In this paper, we introduce a lightweight approach Video2Action, to automatically generate the action scenes and predict the action locations from the video by using image-processing and deep-learning methods. The automated experiments demonstrate the good performance of Video2Action in acquiring actions from the videos, and a user study shows the usefulness of our generated action cues in assisting video creators with action annotation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Multimodal Machine Learning Applications
