DDS: Decoupled Dynamic Scene-Graph Generation Network

A S M Iftekhar; Raphael Ruschel; Satish Kumar; Suya You; B.S.; Manjunath

arXiv:2301.07666·cs.CV·January 22, 2025

DDS: Decoupled Dynamic Scene-Graph Generation Network

A S M Iftekhar, Raphael Ruschel, Satish Kumar, Suya You, B.S., Manjunath

PDF

Open Access

TL;DR

This paper introduces DDS, a novel scene-graph generation network that decouples object and relationship features, significantly improving the detection of unseen triplets in scene understanding tasks.

Contribution

The paper proposes a decoupled network architecture that disentangles object and relationship features, enabling better detection of novel object-relationship combinations.

Findings

01

Outperforms previous methods on three datasets.

02

Significantly improves detection of unseen triplets.

03

Demonstrates robustness in dynamic scene understanding.

Abstract

Scene-graph generation involves creating a structural representation of the relationships between objects in a scene by predicting subject-object-relation triplets from input data. Existing methods show poor performance in detecting triplets outside of a predefined set, primarily due to their reliance on dependent feature learning. To address this issue, we propose DDS -- a decoupled dynamic scene-graph generation network -- that consists of two independent branches that can disentangle extracted features. The key innovation of the current paper is the decoupling of the features representing the relationships from those of the objects, which enables the detection of novel object-relationship combinations. The DDS model is evaluated on three datasets and outperforms previous methods by a significant margin, especially in detecting previously unseen triplets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition