Open-Vocabulary Object Detection via Scene Graph Discovery

Hengcan Shi; Munawar Hayat; Jianfei Cai

arXiv:2307.03339·cs.CV·July 10, 2023·1 cites

Open-Vocabulary Object Detection via Scene Graph Discovery

Hengcan Shi, Munawar Hayat, Jianfei Cai

PDF

Open Access

TL;DR

This paper introduces a novel scene-graph-based network for open-vocabulary object detection that leverages scene graph cues and cross-modal learning to improve detection and scene graph generation, outperforming previous methods.

Contribution

It proposes a new SGDN framework utilizing scene graphs for enhanced open-vocabulary detection and scene graph generation, integrating scene-graph-guided attention and cross-modal learning mechanisms.

Findings

01

Effective on COCO and LVIS datasets

02

Outperforms previous open-vocabulary detection methods

03

Enables open-vocabulary scene graph detection

Abstract

In recent years, open-vocabulary (OV) object detection has attracted increasing research attention. Unlike traditional detection, which only recognizes fixed-category objects, OV detection aims to detect objects in an open category set. Previous works often leverage vision-language (VL) training data (e.g., referring grounding data) to recognize OV objects. However, they only use pairs of nouns and individual objects in VL data, while these data usually contain much more information, such as scene graphs, which are also crucial for OV detection. In this paper, we propose a novel Scene-Graph-Based Discovery Network (SGDN) that exploits scene graph cues for OV detection. Firstly, a scene-graph-based decoder (SGDecoder) including sparse scene-graph-guided attention (SSGA) is presented. It captures scene graphs and leverages them to discover OV objects. Secondly, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Graph Neural Networks · Topic Modeling