Dual Guidance Semi-Supervised Action Detection

Ankit Singh; Efstratios Gavves; Cees G. M. Snoek; Hilde Kuehne

arXiv:2507.21247·cs.CV·July 30, 2025

Dual Guidance Semi-Supervised Action Detection

Ankit Singh, Efstratios Gavves, Cees G. M. Snoek, Hilde Kuehne

PDF

TL;DR

This paper introduces a semi-supervised learning method for spatial-temporal action detection using a dual guidance network to improve pseudo-bounding box selection, significantly enhancing performance with limited labeled data.

Contribution

It presents a novel dual guidance network that combines frame-level classification and bounding-box prediction for better pseudo-labels in semi-supervised action detection.

Findings

01

Outperforms existing semi-supervised baselines on UCF101-24, J-HMDB-21, and AVA datasets.

02

Significantly improves detection accuracy with limited labeled data.

03

Demonstrates the effectiveness of dual guidance in spatial-temporal localization.

Abstract

Semi-Supervised Learning (SSL) has shown tremendous potential to improve the predictive performance of deep learning models when annotations are hard to obtain. However, the application of SSL has so far been mainly studied in the context of image classification. In this work, we present a semi-supervised approach for spatial-temporal action localization. We introduce a dual guidance network to select better pseudo-bounding boxes. It combines a frame-level classification with a bounding-box prediction to enforce action class consistency across frames and boxes. Our evaluation across well-known spatial-temporal action localization datasets, namely UCF101-24 , J-HMDB-21 and AVA shows that the proposed module considerably enhances the model's performance in limited labeled data settings. Our framework achieves superior results compared to extended image-based semi-supervised baselines.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.