Reasoning over the Behaviour of Objects in Video-Clips for Adverb-Type   Recognition

Amrit Diggavi Seshadri; Alessandra Russo

arXiv:2307.04132·cs.CV·March 29, 2024

Reasoning over the Behaviour of Objects in Video-Clips for Adverb-Type Recognition

Amrit Diggavi Seshadri, Alessandra Russo

PDF

Open Access

TL;DR

This paper introduces a novel framework that reasons over object-behaviors in videos to recognize adverb types, even when action types are unknown, outperforming previous methods and providing new datasets for symbolic video analysis.

Contribution

The paper presents a new pipeline for extracting interpretable object-behavior facts and applies symbolic and transformer reasoning to identify adverb types in videos without prior action knowledge.

Findings

01

Proposed methods outperform previous state-of-the-art.

02

Introduced two new datasets: MSR-VTT-ASP and ActivityNet-ASP.

03

Framework works effectively even when action types are unknown.

Abstract

In this work, following the intuition that adverbs describing scene-sequences are best identified by reasoning over high-level concepts of object-behavior, we propose the design of a new framework that reasons over object-behaviours extracted from raw-video-clips to recognize the clip's corresponding adverb-types. Importantly, while previous works for general scene adverb-recognition assume knowledge of the clips underlying action-types, our method is directly applicable in the more general problem setting where the action-type of a video-clip is unknown. Specifically, we propose a novel pipeline that extracts human-interpretable object-behaviour-facts from raw video clips and propose novel symbolic and transformer based reasoning methods that operate over these extracted facts to identify adverb-types. Experiment results demonstrate that our proposed methods perform favourably against…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization