TL;DR
This paper introduces a novel spatial-aware object embedding for zero-shot action localization and classification in videos, leveraging object detectors and spatial preferences to improve performance without supervision.
Contribution
The main contribution is the development of a spatial-aware object embedding that incorporates local and global object information for zero-shot action recognition.
Findings
Achieves state-of-the-art zero-shot localization and classification results.
Competitive with supervised action localization methods.
Supports a new spatio-temporal action retrieval scenario.
Abstract
We aim for zero-shot localization and classification of human actions in video. Where traditional approaches rely on global attribute or object classification scores for their zero-shot knowledge transfer, our main contribution is a spatial-aware object embedding. To arrive at spatial awareness, we build our embedding on top of freely available actor and object detectors. Relevance of objects is determined in a word embedding space and further enforced with estimated spatial preferences. Besides local object awareness, we also embed global object awareness into our embedding to maximize actor and object interaction. Finally, we exploit the object positions and sizes in the spatial-aware embedding to demonstrate a new spatio-temporal action retrieval scenario with composite queries. Action localization and classification experiments on four contemporary action video datasets support our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Spatial-Aware Object Embeddings for Zero-Shot Localization and Classification of Actions· youtube
