The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and   Sentences From Natural Supervision

Jiayuan Mao; Chuang Gan; Pushmeet Kohli; Joshua B. Tenenbaum; Jiajun; Wu

arXiv:1904.12584·cs.CV·April 30, 2019·144 cites

The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision

Jiayuan Mao, Chuang Gan, Pushmeet Kohli, Joshua B. Tenenbaum, Jiajun, Wu

PDF

Open Access 2 Repos

TL;DR

The paper introduces NS-CL, a neuro-symbolic model that learns visual concepts, language understanding, and semantic parsing from natural supervision, enabling flexible scene interpretation and question answering without explicit annotations.

Contribution

It presents a novel neuro-symbolic framework that jointly learns visual concepts and language parsing from paired images and questions, with strong generalization capabilities.

Findings

01

High accuracy in learning visual concepts and language parsing.

02

Effective generalization to new attributes, scenes, and questions.

03

Supports applications like visual question answering and image-text retrieval.

Abstract

We propose the Neuro-Symbolic Concept Learner (NS-CL), a model that learns visual concepts, words, and semantic parsing of sentences without explicit supervision on any of them; instead, our model learns by simply looking at images and reading paired questions and answers. Our model builds an object-based scene representation and translates sentences into executable, symbolic programs. To bridge the learning of two modules, we use a neuro-symbolic reasoning module that executes these programs on the latent scene representation. Analogical to human concept learning, the perception module learns visual concepts based on the language description of the object being referred to. Meanwhile, the learned visual concepts facilitate learning new words and parsing new sentences. We use curriculum learning to guide the searching over the large compositional space of images and language. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition