RS-Net: Context-Aware Relation Scoring for Dynamic Scene Graph Generation

Hae-Won Jo; Yeong-Jun Cho

arXiv:2511.08651·cs.CV·November 13, 2025

RS-Net: Context-Aware Relation Scoring for Dynamic Scene Graph Generation

Hae-Won Jo, Yeong-Jun Cho

PDF

Open Access

TL;DR

RS-Net is a modular framework that improves dynamic scene graph generation by scoring object relations with spatial and temporal context, enhancing relation prediction accuracy in videos.

Contribution

It introduces RS-Net, a novel relation scoring module that integrates spatial and temporal context into existing DSGG models without architectural changes.

Findings

01

Improves Recall and Precision on Action Genome dataset

02

Enhances mean Recall for long-tailed relation distribution

03

Maintains efficiency despite increased parameters

Abstract

Dynamic Scene Graph Generation (DSGG) models how object relations evolve over time in videos. However, existing methods are trained only on annotated object pairs and lack guidance for non-related pairs, making it difficult to identify meaningful relations during inference. In this paper, we propose Relation Scoring Network (RS-Net), a modular framework that scores the contextual importance of object pairs using both spatial interactions and long-range temporal context. RS-Net consists of a spatial context encoder with learnable context tokens and a temporal encoder that aggregates video-level information. The resulting relation scores are integrated into a unified triplet scoring mechanism to enhance relation prediction. RS-Net can be easily integrated into existing DSGG models without architectural changes. Experiments on the Action Genome dataset show that RS-Net consistently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition