Visual Content Detection in Educational Videos with Transfer Learning and Dataset Enrichment
Dipayan Biswas, Shishir Shah, Jaspal Subhlok

TL;DR
This paper presents a transfer learning approach using YOLO for detecting visual elements in lecture videos, addressing challenges with dataset scarcity and object variability, and provides a benchmark dataset and source code for future research.
Contribution
It introduces a transfer learning method with dataset enrichment for visual element detection in lecture videos, including a benchmark dataset and open-source tools.
Findings
YOLO outperformed other models in detecting lecture visual elements
Training on multiple datasets improved detection accuracy
The approach offers a general solution for object detection in lecture videos
Abstract
Video is transforming education with online courses and recorded lectures supplementing and replacing classroom teaching. Recent research has focused on enhancing information retrieval for video lectures with advanced navigation, searchability, summarization, as well as question answering chatbots. Visual elements like tables, charts, and illustrations are central to comprehension, retention, and data presentation in lecture videos, yet their full potential for improving access to video content remains underutilized. A major factor is that accurate automatic detection of visual elements in a lecture video is challenging; reasons include i) most visual elements, such as charts, graphs, tables, and illustrations, are artificially created and lack any standard structure, and ii) coherent visual objects may lack clear boundaries and may be composed of connected text and visual components.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Visual Attention and Saliency Detection
