Lecture Video Visual Objects (LVVO) Dataset: A Benchmark for Visual Object Detection in Educational Videos

Dipayan Biswas; Shishir Shah; Jaspal Subhlok

arXiv:2506.13657·cs.CV·June 18, 2025

Lecture Video Visual Objects (LVVO) Dataset: A Benchmark for Visual Object Detection in Educational Videos

Dipayan Biswas, Shishir Shah, Jaspal Subhlok

PDF

Open Access 2 Repos

TL;DR

The LVVO dataset provides a comprehensive benchmark for visual object detection in educational videos, including manually annotated and semi-automatically expanded data across multiple scientific disciplines.

Contribution

This paper introduces the LVVO dataset, a new benchmark with high-quality annotations for visual object detection in educational videos, supporting supervised and semi-supervised learning.

Findings

01

High inter-annotator agreement with 83.41% F1 score

02

Manual and semi-automatic annotations expand the dataset to 4,000 frames

03

Dataset covers diverse scientific disciplines and visual categories

Abstract

We introduce the Lecture Video Visual Objects (LVVO) dataset, a new benchmark for visual object detection in educational video content. The dataset consists of 4,000 frames extracted from 245 lecture videos spanning biology, computer science, and geosciences. A subset of 1,000 frames, referred to as LVVO_1k, has been manually annotated with bounding boxes for four visual categories: Table, Chart-Graph, Photographic-image, and Visual-illustration. Each frame was labeled independently by two annotators, resulting in an inter-annotator F1 score of 83.41%, indicating strong agreement. To ensure high-quality consensus annotations, a third expert reviewed and resolved all cases of disagreement through a conflict resolution process. To expand the dataset, a semi-supervised approach was employed to automatically annotate the remaining 3,000 frames, forming LVVO_3k. The complete dataset offers a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques