Single-Stage Visual Relationship Learning using Conditional Queries

Alakh Desai; Tz-Ying Wu; Subarna Tripathi; Nuno Vasconcelos

arXiv:2306.05689·cs.CV·June 12, 2023·5 cites

Single-Stage Visual Relationship Learning using Conditional Queries

Alakh Desai, Tz-Ying Wu, Subarna Tripathi, Nuno Vasconcelos

PDF

Open Access 1 Video

TL;DR

This paper introduces TraCQ, a single-stage scene graph generation model using conditional queries and DETR architecture, achieving superior performance and efficiency over existing methods.

Contribution

Proposes a novel single-stage SGG model with conditional queries that simplifies multi-task learning and reduces parameters, outperforming existing methods.

Findings

01

TraCQ reduces parameters by 20% compared to state-of-the-art.

02

Outperforms existing single-stage SGG methods on Visual Genome.

03

Beats many two-stage methods while enabling end-to-end training.

Abstract

Research in scene graph generation (SGG) usually considers two-stage models, that is, detecting a set of entities, followed by combining them and labeling all possible relationships. While showing promising results, the pipeline structure induces large parameter and computation overhead, and typically hinders end-to-end optimizations. To address this, recent research attempts to train single-stage models that are computationally efficient. With the advent of DETR, a set based detection model, one-stage models attempt to predict a set of subject-predicate-object triplets directly in a single shot. However, SGG is inherently a multi-task learning problem that requires modeling entity and predicate distributions simultaneously. In this paper, we propose Transformers with conditional queries for SGG, namely, TraCQ with a new formulation for SGG that avoids the multi-task learning problem…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Single-Stage Visual Relationship Learning using Conditional Queries· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Advanced Graph Neural Networks