Action Modifiers: Learning from Adverbs in Instructional Videos

Hazel Doughty; Ivan Laptev; Walterio Mayol-Cuevas; Dima Damen

arXiv:1912.06617·cs.CV·March 25, 2020

Action Modifiers: Learning from Adverbs in Instructional Videos

Hazel Doughty, Ivan Laptev, Walterio Mayol-Cuevas, Dima Damen

PDF

1 Repo 1 Video

TL;DR

This paper introduces a novel weakly supervised learning approach to recognize and understand adverbs in instructional videos by modeling their effects as transformations in an embedding space, improving video-to-adverb retrieval.

Contribution

The paper proposes a new method to learn adverb representations from weakly supervised videos, using attention and embedding transformations, with no prior work addressing adverbs in this context.

Findings

01

Achieved 0.719 mAP in video-to-adverb retrieval.

02

Demonstrated the ability to attend to relevant video parts for adverb recognition.

03

Outperformed all baseline methods in the task.

Abstract

We present a method to learn a representation for adverbs from instructional videos using weak supervision from the accompanying narrations. Key to our method is the fact that the visual representation of the adverb is highly dependant on the action to which it applies, although the same adverb will modify multiple actions in a similar way. For instance, while 'spread quickly' and 'mix quickly' will look dissimilar, we can learn a common representation that allows us to recognize both, among other actions. We formulate this as an embedding problem, and use scaled dot-product attention to learn from weakly-supervised video narrations. We jointly learn adverbs as invertible transformations operating on the embedding space, so as to add or remove the effect of the adverb. As there is no prior work on weakly supervised learning from adverbs, we gather paired action-adverb annotations from a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hazeld/action-modifiers
pytorchOfficial

Videos

Action Modifiers: Learning From Adverbs in Instructional Videos· youtube

Taxonomy

MethodsSoftmax · Attention Is All You Need