Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and   Anticipation

Rohith Peddi; Saurabh; Ayush Abhay Shrivastava; Parag Singla; Vibhav; Gogate

arXiv:2411.13059·cs.CV·March 26, 2025

Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation

Rohith Peddi, Saurabh, Ayush Abhay Shrivastava, Parag Singla, Vibhav, Gogate

PDF

Open Access

TL;DR

This paper introduces ImparTail, a training framework that reduces bias and improves robustness in spatio-temporal scene graph generation and anticipation by using loss masking, curriculum learning, and new benchmark tasks.

Contribution

We propose a novel unbiased training method with curriculum-driven mask generation for more balanced spatio-temporal scene graph models, along with new robustness benchmarks.

Findings

01

Outperforms existing methods in unbiased scene graph generation

02

Demonstrates robustness under distribution shifts

03

Achieves superior results on the Action Genome dataset

Abstract

Spatio-Temporal Scene Graphs (STSGs) provide a concise and expressive representation of dynamic scenes by modeling objects and their evolving relationships over time. However, real-world visual relationships often exhibit a long-tailed distribution, causing existing methods for tasks like Video Scene Graph Generation (VidSGG) and Scene Graph Anticipation (SGA) to produce biased scene graphs. To this end, we propose ImparTail, a novel training framework that leverages loss masking and curriculum learning to mitigate bias in the generation and anticipation of spatio-temporal scene graphs. Unlike prior methods that add extra architectural components to learn unbiased estimators, we propose an impartial training objective that reduces the dominance of head classes during learning and focuses on underrepresented tail relationships. Our curriculum-driven mask generation strategy further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Human Pose and Action Recognition