Topic Scene Graph Generation by Attention Distillation from Caption
W. Wang, R. Wang, X. Chen

TL;DR
This paper introduces a method to improve scene graph generation by distilling attention from image captions, making the scene graph more focused and semantically rich, with joint caption and scene graph generation.
Contribution
It proposes an attention distillation approach from captions to enhance scene graph importance estimation without strong supervision.
Findings
Significant improvement in relationship importance mining.
Enhanced scene graph relevance and accuracy.
Potential for joint caption and scene graph generation.
Abstract
If an image tells a story, the image caption is the briefest narrator. Generally, a scene graph prefers to be an omniscient generalist, while the image caption is more willing to be a specialist, which outlines the gist. Lots of previous studies have found that a scene graph is not as practical as expected unless it can reduce the trivial contents and noises. In this respect, the image caption is a good tutor. To this end, we let the scene graph borrow the ability from the image caption so that it can be a specialist on the basis of remaining all-around, resulting in the so-called Topic Scene Graph. What an image caption pays attention to is distilled and passed to the scene graph for estimating the importance of partial objects, relationships, and events. Specifically, during the caption generation, the attention about individual objects in each time step is collected, pooled, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Image and Video Retrieval Techniques
