Contextual Explainable Video Representation: Human Perception-based   Understanding

Khoa Vo; Kashu Yamazaki; Phong X. Nguyen; Phat Nguyen; Khoa Luu; Ngan; Le

arXiv:2212.06206·cs.CV·December 20, 2022·1 cites

Contextual Explainable Video Representation: Human Perception-based Understanding

Khoa Vo, Kashu Yamazaki, Phong X. Nguyen, Phat Nguyen, Khoa Luu, Ngan, Le

PDF

Open Access 1 Repo

TL;DR

This paper introduces a human perception-inspired, explainable approach for extracting contextual video representations, improving understanding of actions and scenes by modeling actors, objects, and environment interactions.

Contribution

It proposes a novel, explainable video representation method based on human perception factors, enhancing the interpretability and effectiveness of video understanding tasks.

Findings

01

Improved performance in video captioning and action detection tasks.

02

Enhanced interpretability of video representations.

03

Demonstrated effectiveness of perception-based modeling.

Abstract

Video understanding is a growing field and a subject of intense research, which includes many interesting tasks to understanding both spatial and temporal information, e.g., action detection, action recognition, video captioning, video retrieval. One of the most challenging problems in video understanding is dealing with feature extraction, i.e. extract contextual visual representation from given untrimmed video due to the long and complicated temporal structure of unconstrained videos. Different from existing approaches, which apply a pre-trained backbone network as a black-box to extract visual representation, our approach aims to extract the most contextual information with an explainable mechanism. As we observed, humans typically perceive a video through the interactions between three main factors, i.e., the actors, the relevant objects, and the surrounding environment. Therefore,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uark-aicv/video_representation
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization