TRKT: Weakly Supervised Dynamic Scene Graph Generation with Temporal-enhanced Relation-aware Knowledge Transferring

Zhu Xu; Ting Lei; Zhimin Li; Guan Wang; Qingchao Chen; Yuxin Peng; Yang liu

arXiv:2508.04943·cs.CV·August 8, 2025

TRKT: Weakly Supervised Dynamic Scene Graph Generation with Temporal-enhanced Relation-aware Knowledge Transferring

Zhu Xu, Ting Lei, Zhimin Li, Guan Wang, Qingchao Chen, Yuxin Peng, Yang liu

PDF

TL;DR

This paper introduces TRKT, a novel method that enhances weakly supervised dynamic scene graph generation by leveraging relation-aware knowledge transfer and motion-aware attention, significantly improving detection accuracy in videos.

Contribution

The paper proposes TRKT, a new approach that uses relation-aware knowledge mining and dual-stream fusion to improve object detection in dynamic scene graphs without extensive annotations.

Findings

01

Achieves state-of-the-art results on Action Genome dataset.

02

Effectively enhances object localization and confidence scores.

03

Robust to motion blur and dynamic scene complexities.

Abstract

Dynamic Scene Graph Generation (DSGG) aims to create a scene graph for each video frame by detecting objects and predicting their relationships. Weakly Supervised DSGG (WS-DSGG) reduces annotation workload by using an unlocalized scene graph from a single frame per video for training. Existing WS-DSGG methods depend on an off-the-shelf external object detector to generate pseudo labels for subsequent DSGG training. However, detectors trained on static, object-centric images struggle in dynamic, relation-aware scenarios required for DSGG, leading to inaccurate localization and low-confidence proposals. To address the challenges posed by external object detectors in WS-DSGG, we propose a Temporal-enhanced Relation-aware Knowledge Transferring (TRKT) method, which leverages knowledge to enhance detection in relation-aware dynamic scenarios. TRKT is built on two key…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.