SCENIR: Visual Semantic Clarity through Unsupervised Scene Graph Retrieval
Nikolaos Chaidos, Angeliki Dimitriou, Maria Lymperaiou, Giorgos Stamou

TL;DR
SCENIR introduces an unsupervised scene graph retrieval framework that emphasizes semantic content over visual biases, outperforming supervised methods and proposing GED as a robust similarity measure for image-to-image retrieval.
Contribution
The paper presents SCENIR, a novel unsupervised scene graph retrieval method using Graph Autoencoders, and advocates for Graph Edit Distance as a reliable similarity metric, advancing semantic image retrieval.
Findings
Outperforms existing supervised GNN approaches in retrieval tasks.
Demonstrates robustness and efficiency across multiple datasets.
Validates generalizability with unannotated datasets using automated scene graph generation.
Abstract
Despite the dominance of convolutional and transformer-based architectures in image-to-image retrieval, these models are prone to biases arising from low-level visual features, such as color. Recognizing the lack of semantic understanding as a key limitation, we propose a novel scene graph-based retrieval framework that emphasizes semantic content over superficial image characteristics. Prior approaches to scene graph retrieval predominantly rely on supervised Graph Neural Networks (GNNs), which require ground truth graph pairs driven from image captions. However, the inconsistency of caption-based supervision stemming from variable text encodings undermine retrieval reliability. To address these, we present SCENIR, a Graph Autoencoder-based unsupervised retrieval framework, which eliminates the dependence on labeled training data. Our model demonstrates superior performance across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
