OmViD: Omni-supervised active learning for video action detection

Aayush Rana; Akash Kumar; Vibhav Vineet; Yogesh S Rawat

arXiv:2508.13983·cs.CV·August 20, 2025

OmViD: Omni-supervised active learning for video action detection

Aayush Rana, Akash Kumar, Vibhav Vineet, Yogesh S Rawat

PDF

TL;DR

This paper introduces OmViD, an active learning framework that adaptively selects annotation types for videos to efficiently train action detection models with reduced labeling effort.

Contribution

It proposes a novel active learning strategy and a 3D-superpixel pseudo-labeling method to optimize annotation levels and improve training efficiency for video action detection.

Findings

01

Significantly reduces annotation costs on UCF101-24 and JHMDB-21 datasets.

02

Maintains high detection performance with minimal annotation effort.

03

Demonstrates effectiveness of adaptive annotation selection in video action detection.

Abstract

Video action detection requires dense spatio-temporal annotations, which are both challenging and expensive to obtain. However, real-world videos often vary in difficulty and may not require the same level of annotation. This paper analyzes the appropriate annotation types for each sample and their impact on spatio-temporal video action detection. It focuses on two key aspects: 1) how to obtain varying levels of annotation for videos, and 2) how to learn action detection from different annotation types. The study explores video-level tags, points, scribbles, bounding boxes, and pixel-level masks. First, a simple active learning strategy is proposed to estimate the necessary annotation type for each video. Then, a novel spatio-temporal 3D-superpixel approach is introduced to generate pseudo-labels from these annotations, enabling effective training. The approach is validated on UCF101-24…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.