VisKnow: Constructing Visual Knowledge Base for Object Understanding
Ziwei Yao, Qiyang Wan, Ruiping Wang, Xilin Chen

TL;DR
VisKnow introduces a framework for building structured visual knowledge bases that integrate multi-modal data, enhancing object understanding and supporting advanced visual tasks like zero-shot recognition and fine-grained VQA.
Contribution
The paper presents a novel construction framework for a visual knowledge base that combines text and image data at object and part levels, exemplified by the creation of AnimalKB.
Findings
AnimalKB improves zero-shot recognition accuracy.
AnimalKB enhances fine-grained visual question answering.
Constructed knowledge base serves as a benchmark for graph completion and segmentation.
Abstract
Understanding objects is fundamental to computer vision. Beyond object recognition that provides only a category label as typical output, in-depth object understanding represents a comprehensive perception of an object category, involving its components, appearance characteristics, inter-category relationships, contextual background knowledge, etc. Developing such capability requires sufficient multi-modal data, including visual annotations such as parts, attributes, and co-occurrences for specific tasks, as well as textual knowledge to support high-level tasks like reasoning and question answering. However, these data are generally task-oriented and not systematically organized enough to achieve the expected understanding of object categories. In response, we propose the Visual Knowledge Base that structures multi-modal object knowledge as graphs, and present a construction framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks
