Building a Large-scale Multimodal Knowledge Base System for Answering   Visual Queries

Yuke Zhu; Ce Zhang; Christopher R\'e; Li Fei-Fei

arXiv:1507.05670·cs.CV·November 11, 2015·45 cites

Building a Large-scale Multimodal Knowledge Base System for Answering Visual Queries

Yuke Zhu, Ce Zhang, Christopher R\'e, Li Fei-Fei

PDF

Open Access

TL;DR

This paper introduces a scalable large-scale multimodal knowledge base system that integrates visual, textual, and structured data to improve visual query answering without retraining classifiers.

Contribution

It presents a novel scalable KB construction system capable of handling half a billion variables, enabling flexible and comprehensive visual query answering.

Findings

01

Achieves competitive recognition and retrieval results

02

Builds a KB with half a billion variables in hours

03

Enhances ability to answer complex visual queries

Abstract

The complexity of the visual world creates significant challenges for comprehensive visual understanding. In spite of recent successes in visual recognition, today's vision systems would still struggle to deal with visual queries that require a deeper reasoning. We propose a knowledge base (KB) framework to handle an assortment of visual queries, without the need to train new classifiers for new tasks. Building such a large-scale multimodal KB presents a major challenge of scalability. We cast a large-scale MRF into a KB representation, incorporating visual, textual and structured data, as well as their diverse relations. We introduce a scalable knowledge base construction system that is capable of building a KB with half billion variables and millions of parameters in a few hours. Our system achieves competitive results compared to purpose-built models on standard recognition and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning