How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs

Hazel Doughty; Cees G. M. Snoek

arXiv:2203.12344·cs.CV·June 13, 2022

How Do You Do It? Fine-Grained Action Understanding with Pseudo-Adverbs

Hazel Doughty, Cees G. M. Snoek

PDF

Open Access 1 Repo

TL;DR

This paper introduces a semi-supervised approach for fine-grained action understanding by recognizing adverbs across actions, addressing annotation scarcity and long-tailed distributions, and enabling recognition of unseen action-adverb combinations and domains.

Contribution

It proposes a novel semi-supervised method with pseudo-labels and adaptive thresholding for adverb recognition in videos, along with new datasets for unseen action-adverb and domain recognition.

Findings

01

Outperforms prior adverb recognition methods

02

Effective semi-supervised learning with pseudo-labels

03

Enables recognition of unseen action-adverb combinations

Abstract

We aim to understand how actions are performed and identify subtle differences, such as 'fold firmly' vs. 'fold gently'. To this end, we propose a method which recognizes adverbs across different actions. However, such fine-grained annotations are difficult to obtain and their long-tailed nature makes it challenging to recognize adverbs in rare action-adverb compositions. Our approach therefore uses semi-supervised learning with multiple adverb pseudo-labels to leverage videos with only action labels. Combined with adaptive thresholding of these pseudo-adverbs we are able to make efficient use of the available data while tackling the long-tailed distribution. Additionally, we gather adverb annotations for three existing video retrieval datasets, which allows us to introduce the new tasks of recognizing adverbs in unseen action-adverb compositions and unseen domains. Experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hazeld/pseudoadverbs
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Analysis and Summarization