A Comprehensive Survey of Scene Graphs: Generation and Application
Xiaojun Chang, Pengzhen Ren, Pengfei Xu, Zhihui Li, Xiaojiang Chen,, and Alex Hauptmann

TL;DR
This survey comprehensively reviews scene graphs, covering their generation methods, applications, datasets, and future directions, highlighting their importance in advanced scene understanding and reasoning tasks.
Contribution
It provides the first systematic and comprehensive overview of scene graph research, including generation techniques, applications, datasets, and future insights.
Findings
Summarizes various scene graph generation methods.
Details key applications like image captioning and VQA.
Lists major datasets used in scene graph research.
Abstract
Scene graph is a structured representation of a scene that can clearly express the objects, attributes, and relationships between objects in the scene. As computer vision technology continues to develop, people are no longer satisfied with simply detecting and recognizing objects in images; instead, people look forward to a higher level of understanding and reasoning about visual scenes. For example, given an image, we want to not only detect and recognize objects in the image, but also know the relationship between objects (visual relationship detection), and generate a text description (image captioning) based on the image content. Alternatively, we might want the machine to tell us what the little girl in the image is doing (Visual Question Answering (VQA)), or even remove the dog from the image and find similar images (image editing and retrieval), etc. These tasks require a higher…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
