Visual Content Detection in Educational Videos with Transfer Learning and Dataset Enrichment

Dipayan Biswas; Shishir Shah; Jaspal Subhlok

arXiv:2506.21903·cs.CV·August 19, 2025

Visual Content Detection in Educational Videos with Transfer Learning and Dataset Enrichment

Dipayan Biswas, Shishir Shah, Jaspal Subhlok

PDF

Open Access

TL;DR

This paper presents a transfer learning approach using YOLO for detecting visual elements in lecture videos, addressing challenges with dataset scarcity and object variability, and provides a benchmark dataset and source code for future research.

Contribution

It introduces a transfer learning method with dataset enrichment for visual element detection in lecture videos, including a benchmark dataset and open-source tools.

Findings

01

YOLO outperformed other models in detecting lecture visual elements

02

Training on multiple datasets improved detection accuracy

03

The approach offers a general solution for object detection in lecture videos

Abstract

Video is transforming education with online courses and recorded lectures supplementing and replacing classroom teaching. Recent research has focused on enhancing information retrieval for video lectures with advanced navigation, searchability, summarization, as well as question answering chatbots. Visual elements like tables, charts, and illustrations are central to comprehension, retention, and data presentation in lecture videos, yet their full potential for improving access to video content remains underutilized. A major factor is that accurate automatic detection of visual elements in a lecture video is challenging; reasons include i) most visual elements, such as charts, graphs, tables, and illustrations, are artificially created and lack any standard structure, and ii) coherent visual objects may lack clear boundaries and may be composed of connected text and visual components.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Multimodal Machine Learning Applications · Visual Attention and Saliency Detection