Frequency-guided Multi-level Reasoning for Scene Graph Generation in Video

Chenxing Li; Yiping Duan; Xiaoming Tao

arXiv:2604.17298·cs.CV·April 21, 2026

Frequency-guided Multi-level Reasoning for Scene Graph Generation in Video

Chenxing Li, Yiping Duan, Xiaoming Tao

PDF

TL;DR

This paper introduces FReMuRe, a novel model for video scene graph generation that effectively handles long-tail relationship distributions through frequency-aware and multi-level reasoning mechanisms.

Contribution

The proposed FReMuRe model incorporates relation-specific branches and frequency-aware dual-branch predicate embedding to improve tail class recall and reasoning robustness.

Findings

01

FReMuRe significantly improves long-tail relationship recall on Action Genome dataset.

02

The model enhances intra-class diversity with Gaussian Mixture Model heads.

03

FReMuRe achieves more balanced and tail-aware learning compared to previous methods.

Abstract

Video Scene Graph Generation aims to obtain structured semantic representations of objects and their relationships in videos for high-level understanding. However, existing methods still have limitations in handling long-tail distributions. This paper proposes the Frequency-guided Relational Multi-level Reasoning (FReMuRe) model, which enhances the modeling ability of long-tail relationships from a mechanism perspective. We introduce relation-specific branches to deal gradient conflicts, yielding more balanced and tail-aware learning. And we design a frequency-aware dual-branch predicate embedding network to model high-frequency and low-frequency relationships separately and improve the recall rate of tail classes through gated fusion. Meanwhile, we propose two types of interchangeable relation classification heads: Bayesian Head for uncertainty estimation and new Gaussian Mixture Model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.