Pointly-Supervised Action Localization

Pascal Mettes; Cees G. M. Snoek

arXiv:1805.11333·cs.CV·October 2, 2018

Pointly-Supervised Action Localization

Pascal Mettes, Cees G. M. Snoek

PDF

TL;DR

This paper introduces a point-supervised approach for spatio-temporal action localization in videos, reducing annotation costs by replacing bounding boxes with sparse point annotations, and demonstrates competitive performance with robustness to noise.

Contribution

It proposes a novel point-supervised training method leveraging spatio-temporal proposals and pseudo-points, offering an effective alternative to box-supervision for action localization.

Findings

01

Achieves comparable accuracy to box-supervision with fewer annotations

02

Robust to sparse and noisy point annotations

03

Outperforms recent weakly-supervised methods

Abstract

This paper strives for spatio-temporal localization of human actions in videos. In the literature, the consensus is to achieve localization by training on bounding box annotations provided for each frame of each training video. As annotating boxes in video is expensive, cumbersome and error-prone, we propose to bypass box-supervision. Instead, we introduce action localization based on point-supervision. We start from unsupervised spatio-temporal proposals, which provide a set of candidate regions in videos. While normally used exclusively for inference, we show spatio-temporal proposals can also be leveraged during training when guided by a sparse set of point annotations. We introduce an overlap measure between points and spatio-temporal proposals and incorporate them all into a new objective of a Multiple Instance Learning optimization. During inference, we introduce pseudo-points,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.