Triple Correlations-Guided Label Supplementation for Unbiased Video   Scene Graph Generation

Wenqing Wang; Kaifeng Gao; Yawei Luo; Tao Jiang; Fei Gao; Jian Shao,; Jianwen Sun; Jun Xiao

arXiv:2307.16309·cs.CV·August 1, 2023

Triple Correlations-Guided Label Supplementation for Unbiased Video Scene Graph Generation

Wenqing Wang, Kaifeng Gao, Yawei Luo, Tao Jiang, Fei Gao, Jian Shao,, Jianwen Sun, Jun Xiao

PDF

Open Access

TL;DR

This paper introduces Trico, a method that leverages triple correlations to supplement missing predicate labels in video scene graph generation, significantly reducing bias and improving performance on under-represented predicates.

Contribution

Trico is the first approach to explicitly address missing predicate annotations in VidSGG by exploiting spatio-temporal correlations for unbiased label supplementation.

Findings

01

Achieves state-of-the-art results on VidVRD and VidOR datasets.

02

Significantly improves performance on tail predicates.

03

Effectively reduces bias caused by incomplete annotations.

Abstract

Video-based scene graph generation (VidSGG) is an approach that aims to represent video content in a dynamic graph by identifying visual entities and their relationships. Due to the inherently biased distribution and missing annotations in the training data, current VidSGG methods have been found to perform poorly on less-represented predicates. In this paper, we propose an explicit solution to address this under-explored issue by supplementing missing predicates that should be appear in the ground-truth annotations. Dubbed Trico, our method seeks to supplement the missing predicates by exploring three complementary spatio-temporal correlations. Guided by these correlations, the missing labels can be effectively supplemented thus achieving an unbiased predicate predictions. We validate the effectiveness of Trico on the most widely used VidSGG datasets, i.e., VidVRD and VidOR. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Human Pose and Action Recognition