Investigating Correlations of Automatically Extracted Multimodal Features and Lecture Video Quality
Jianwei Shi, Christian Otto, Anett Hoppe, Peter Holtz, Ralph Ewerth

TL;DR
This study explores how automatically extracted multimodal features from lecture videos correlate with perceived video quality and their potential to enhance video recommendation systems in educational contexts.
Contribution
The paper introduces a set of cross-modal features combining transcripts, audio, video, and slides, and investigates their correlation with human-rated video quality in MOOCs.
Findings
Certain audio and linguistic features correlate with perceived video quality.
Cross-modal features show promising potential for improving lecture video recommendations.
Features impact on knowledge gain is discussed.
Abstract
Ranking and recommendation of multimedia content such as videos is usually realized with respect to the relevance to a user query. However, for lecture videos and MOOCs (Massive Open Online Courses) it is not only required to retrieve relevant videos, but particularly to find lecture videos of high quality that facilitate learning, for instance, independent of the video's or speaker's popularity. Thus, metadata about a lecture video's quality are crucial features for learning contexts, e.g., lecture video recommendation in search as learning scenarios. In this paper, we investigate whether automatically extracted features are correlated to quality aspects of a video. A set of scholarly videos from a Mass Open Online Course (MOOC) is analyzed regarding audio, linguistic, and visual features. Furthermore, a set of cross-modal features is proposed which are derived by combining…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
