Video2Action: Reducing Human Interactions in Action Annotation of App   Tutorial Videos

Sidong Feng; Chunyang Chen; Zhenchang Xing

arXiv:2308.03252·cs.HC·August 8, 2023

Video2Action: Reducing Human Interactions in Action Annotation of App Tutorial Videos

Sidong Feng, Chunyang Chen, Zhenchang Xing

PDF

Open Access

TL;DR

Video2Action is a lightweight method that automatically detects actions and their locations in tutorial videos, reducing manual effort and aiding creators in annotating app tutorial videos effectively.

Contribution

The paper introduces Video2Action, a novel automated approach combining image processing and deep learning to streamline action annotation in tutorial videos.

Findings

01

High accuracy in action detection and localization

02

User study confirms usefulness in aiding video creators

03

Automated method reduces annotation time

Abstract

Tutorial videos of mobile apps have become a popular and compelling way for users to learn unfamiliar app features. To make the video accessible to the users, video creators always need to annotate the actions in the video, including what actions are performed and where to tap. However, this process can be time-consuming and labor-intensive. In this paper, we introduce a lightweight approach Video2Action, to automatically generate the action scenes and predict the action locations from the video by using image-processing and deep-learning methods. The automated experiments demonstrate the good performance of Video2Action in acquiring actions from the videos, and a user study shows the usefulness of our generated action cues in assisting video creators with action annotation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Multimodal Machine Learning Applications