Weakly Supervised Video Scene Graph Generation via Natural Language   Supervision

Kibum Kim; Kanghoon Yoon; Yeonjun In; Jaehyeong Jeon; Jinyoung Moon,; Donghyun Kim; Chanyoung Park

arXiv:2502.15370·cs.CV·February 24, 2025

Weakly Supervised Video Scene Graph Generation via Natural Language Supervision

Kibum Kim, Kanghoon Yoon, Yeonjun In, Jaehyeong Jeon, Jinyoung Moon,, Donghyun Kim, Chanyoung Park

PDF

1 Repo

TL;DR

This paper introduces a weakly supervised video scene graph generation framework that leverages natural language video captions and addresses temporal and action duration challenges, significantly reducing annotation costs.

Contribution

It proposes a novel NL-VSGG framework with TCS and ADV modules to effectively utilize video captions for scene graph generation, handling temporal markers and action durations.

Findings

01

Improved performance over naive caption-based methods on Action Genome dataset.

02

Model predicts a broader range of action classes beyond training data.

03

Significant reduction in annotation costs for VidSGG.

Abstract

Existing Video Scene Graph Generation (VidSGG) studies are trained in a fully supervised manner, which requires all frames in a video to be annotated, thereby incurring high annotation cost compared to Image Scene Graph Generation (ImgSGG). Although the annotation cost of VidSGG can be alleviated by adopting a weakly supervised approach commonly used for ImgSGG (WS-ImgSGG) that uses image captions, there are two key reasons that hinder such a naive adoption: 1) Temporality within video captions, i.e., unlike image captions, video captions include temporal markers (e.g., before, while, then, after) that indicate time related details, and 2) Variability in action duration, i.e., unlike human actions in image captions, human actions in video captions unfold over varying duration. To address these issues, we propose a Natural Language-based Video Scene Graph Generation (NL-VSGG) framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rlqja1107/NL-VSGG
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.