CAGE-SGG: Counterfactual Active Graph Evidence for Open-Vocabulary Scene Graph Generation

Suiyang Guang; Chenyu Liu; Ruohan Zhang; Siyuan Chen

arXiv:2604.22274·cs.CV·May 13, 2026

CAGE-SGG: Counterfactual Active Graph Evidence for Open-Vocabulary Scene Graph Generation

Suiyang Guang, Chenyu Liu, Ruohan Zhang, Siyuan Chen

PDF

TL;DR

This paper introduces CAGE-SGG, a framework for open-vocabulary scene graph generation that verifies relations based on visual evidence, improving reliability and interpretability over prior methods.

Contribution

It proposes a counterfactual relation verification approach with evidence decomposition, relation-conditioned encoding, and graph-level optimization for more accurate scene graphs.

Findings

01

Improves recall-based metrics across benchmarks.

02

Enhances unseen predicate generalization.

03

Provides more reliable, evidence-grounded scene graphs.

Abstract

Open-vocabulary scene graph generation (SGG) aims to describe visual scenes with flexible and fine-grained relation phrases beyond a fixed predicate vocabulary. While recent vision-language models greatly expand the semantic coverage of SGG, they also introduce a critical reliability issue: predicted relations may be driven by language priors or object co-occurrence rather than grounded visual evidence. In this paper, we propose an evidence-rounded open-vocabulary SGG framework based on counterfactual relation verification. Instead of directly accepting plausible relation proposals, our method verifies whether each candidate relation is supported by relation-pecific visual, geometric, and contextual evidence. Specifically, we first generate open-vocabulary relation candidates with a vision-language proposer, then decompose predicate phrases into soft evidence bases such as support,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.