Supplementing Missing Visions via Dialog for Scene Graph Generations

Zhenghao Zhao; Ye Zhu; Xiaoguang Zhu; Yuzhang Shang; Yan Yan

arXiv:2204.11143·cs.CV·April 2, 2024

Supplementing Missing Visions via Dialog for Scene Graph Generations

Zhenghao Zhao, Ye Zhu, Xiaoguang Zhu, Yuzhang Shang, Yan Yan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel approach to scene graph generation that uses natural language dialog to compensate for missing visual data, improving performance in incomplete visual scenarios.

Contribution

It proposes the SI-Dial framework, enabling existing models to incorporate dialog-based supplementary information for better scene understanding with incomplete visuals.

Findings

01

Significant performance improvements over baselines

02

Effective integration of dialog interactions in vision tasks

03

Feasibility demonstrated with various levels of missing data

Abstract

Most current AI systems rely on the premise that the input visual data are sufficient to achieve competitive performance in various computer vision tasks. However, the classic task setup rarely considers the challenging, yet common practical situations where the complete visual data may be inaccessible due to various reasons (e.g., restricted view range and occlusions). To this end, we investigate a computer vision task setting with incomplete visual input data. Specifically, we exploit the Scene Graph Generation (SGG) task with various levels of visual data missingness as input. While insufficient visual input intuitively leads to performance drop, we propose to supplement the missing visions via the natural language dialog interactions to better accomplish the task objective. We design a model-agnostic Supplementary Interactive Dialog (SI-Dial) framework that can be jointly learned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

l-yezhu/si-dial
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Video Analysis and Summarization