Towards Unseen Triples: Effective Text-Image-joint Learning for Scene   Graph Generation

Qianji Di; Wenxi Ma; Zhongang Qi; Tianxiang Hou; Ying Shan; Hanzi Wang

arXiv:2306.13420·cs.CV·June 26, 2023

Towards Unseen Triples: Effective Text-Image-joint Learning for Scene Graph Generation

Qianji Di, Wenxi Ma, Zhongang Qi, Tianxiang Hou, Ying Shan, Hanzi Wang

PDF

Open Access

TL;DR

This paper introduces TISGG, a novel text-image joint learning model for scene graph generation that effectively predicts unseen triples and addresses dataset bias, achieving state-of-the-art results.

Contribution

The paper proposes a joint feature learning and factual knowledge refinement framework with balanced learning strategies to improve unseen triple prediction in scene graph generation.

Findings

01

Boosts zero-shot recall by 11.7% on Visual Genome.

02

Achieves state-of-the-art performance in scene graph generation.

03

Effectively handles long-tailed distribution and unseen triples.

Abstract

Scene Graph Generation (SGG) aims to structurally and comprehensively represent objects and their connections in images, it can significantly benefit scene understanding and other related downstream tasks. Existing SGG models often struggle to solve the long-tailed problem caused by biased datasets. However, even if these models can fit specific datasets better, it may be hard for them to resolve the unseen triples which are not included in the training set. Most methods tend to feed a whole triple and learn the overall features based on statistical machine learning. Such models have difficulty predicting unseen triples because the objects and predicates in the training set are combined differently as novel triples in the test set. In this work, we propose a Text-Image-joint Scene Graph Generation (TISGG) model to resolve the unseen triples and improve the generalisation capability of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization

MethodsALIGN