SVGraph: Learning Semantic Graphs from Instructional Videos

Madeline C. Schiappa; Yogesh S. Rawat

arXiv:2207.08001·cs.CV·July 19, 2022

SVGraph: Learning Semantic Graphs from Instructional Videos

Madeline C. Schiappa, Yogesh S. Rawat

PDF

Open Access

TL;DR

This paper introduces SVGraph, a self-supervised method for generating interpretable semantic graphs from instructional videos using multi-modal data, without requiring annotations, enhancing understanding of noisy video content.

Contribution

The paper presents a novel self-supervised, multi-modal approach that produces semantic graphs from instructional videos without annotations, improving interpretability and understanding.

Findings

01

SVGraph effectively learns semantic graphs from noisy instructional videos.

02

The approach demonstrates high interpretability in semantic graph learning.

03

Experiments on multiple datasets validate the method's robustness.

Abstract

In this work, we focus on generating graphical representations of noisy, instructional videos for video understanding. We propose a self-supervised, interpretable approach that does not require any annotations for graphical representations, which would be expensive and time consuming to collect. We attempt to overcome "black box" learning limitations by presenting Semantic Video Graph or SVGraph, a multi-modal approach that utilizes narrations for semantic interpretability of the learned graphs. SVGraph 1) relies on the agreement between multiple modalities to learn a unified graphical structure with the help of cross-modal attention and 2) assigns semantic interpretation with the help of Semantic-Assignment, which captures the semantics from video narration. We perform experiments on multiple datasets and demonstrate the interpretability of SVGraph in semantic graph learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Video Analysis and Summarization