Frequency-guided Multi-level Reasoning for Scene Graph Generation in Video
Chenxing Li, Yiping Duan, Xiaoming Tao

TL;DR
This paper introduces FReMuRe, a novel model for video scene graph generation that effectively handles long-tail relationship distributions through frequency-aware and multi-level reasoning mechanisms.
Contribution
The proposed FReMuRe model incorporates relation-specific branches and frequency-aware dual-branch predicate embedding to improve tail class recall and reasoning robustness.
Findings
FReMuRe significantly improves long-tail relationship recall on Action Genome dataset.
The model enhances intra-class diversity with Gaussian Mixture Model heads.
FReMuRe achieves more balanced and tail-aware learning compared to previous methods.
Abstract
Video Scene Graph Generation aims to obtain structured semantic representations of objects and their relationships in videos for high-level understanding. However, existing methods still have limitations in handling long-tail distributions. This paper proposes the Frequency-guided Relational Multi-level Reasoning (FReMuRe) model, which enhances the modeling ability of long-tail relationships from a mechanism perspective. We introduce relation-specific branches to deal gradient conflicts, yielding more balanced and tail-aware learning. And we design a frequency-aware dual-branch predicate embedding network to model high-frequency and low-frequency relationships separately and improve the recall rate of tail classes through gated fusion. Meanwhile, we propose two types of interchangeable relation classification heads: Bayesian Head for uncertainty estimation and new Gaussian Mixture Model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
