DIVE: Towards Descriptive and Diverse Visual Commonsense Generation

Jun-Hyung Park; Hyuntae Park; Youjin Kang; Eojin Jeon; SangKeun Lee; (Korea University)

arXiv:2408.08021·cs.CV·August 16, 2024

DIVE: Towards Descriptive and Diverse Visual Commonsense Generation

Jun-Hyung Park, Hyuntae Park, Youjin Kang, Eojin Jeon, SangKeun Lee, (Korea University)

PDF

1 Repo

TL;DR

DIVE is a novel framework that significantly enhances the descriptiveness and diversity of visual commonsense inferences, achieving human-level performance and outperforming existing models.

Contribution

It introduces generic inference filtering and contrastive retrieval learning to improve diversity and descriptiveness in visual commonsense generation.

Findings

01

Outperforms state-of-the-art models in descriptiveness and diversity.

02

Achieves human-level performance on Visual Commonsense Graphs.

03

Human evaluations show close alignment with human judgments.

Abstract

Towards human-level visual understanding, visual commonsense generation has been introduced to generate commonsense inferences beyond images. However, current research on visual commonsense generation has overlooked an important human cognitive ability: generating descriptive and diverse inferences. In this work, we propose a novel visual commonsense generation framework, called DIVE, which aims to improve the descriptiveness and diversity of generated inferences. DIVE involves two methods, generic inference filtering and contrastive retrieval learning, which address the limitations of existing visual commonsense resources and training objectives. Experimental results verify that DIVE outperforms state-of-the-art models for visual commonsense generation in terms of both descriptiveness and diversity, while showing a superior quality in generating unique and novel inferences. Notably,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

park-ing-lot/dive
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.