VLGrammar: Grounded Grammar Induction of Vision and Language

Yining Hong; Qing Li; Song-Chun Zhu; Siyuan Huang

arXiv:2103.12975·cs.CV·March 25, 2021

VLGrammar: Grounded Grammar Induction of Vision and Language

Yining Hong, Qing Li, Song-Chun Zhu, Siyuan Huang

PDF

Open Access 1 Repo

TL;DR

VLGrammar introduces a joint learning framework for grounded grammar induction in vision and language using compound PCFGs, demonstrating superior performance and generalizability on a new large-scale dataset.

Contribution

It proposes a novel contrastive learning approach for simultaneous induction of language and image grammars grounded in visual structures.

Findings

01

Outperforms baselines in image and language grammar induction

02

Improves image clustering accuracy by 30%

03

Generalizes well to unseen categories

Abstract

Cognitive grammar suggests that the acquisition of language grammar is grounded within visual structures. While grammar is an essential representation of natural language, it also exists ubiquitously in vision to represent the hierarchical part-whole structure. In this work, we study grounded grammar induction of vision and language in a joint learning framework. Specifically, we present VLGrammar, a method that uses compound probabilistic context-free grammars (compound PCFGs) to induce the language grammar and the image grammar simultaneously. We propose a novel contrastive learning framework to guide the joint learning of both modules. To provide a benchmark for the grounded grammar induction task, we collect a large-scale dataset, \textsc{PartIt}, which contains human-written sentences that describe part-level semantics for 3D objects. Experiments on the \textsc{PartIt} dataset show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

evelinehong/VLGrammar
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning

MethodsContrastive Learning