VisKnow: Constructing Visual Knowledge Base for Object Understanding

Ziwei Yao; Qiyang Wan; Ruiping Wang; Xilin Chen

arXiv:2512.08221·cs.CV·December 10, 2025

VisKnow: Constructing Visual Knowledge Base for Object Understanding

Ziwei Yao, Qiyang Wan, Ruiping Wang, Xilin Chen

PDF

Open Access

TL;DR

VisKnow introduces a framework for building structured visual knowledge bases that integrate multi-modal data, enhancing object understanding and supporting advanced visual tasks like zero-shot recognition and fine-grained VQA.

Contribution

The paper presents a novel construction framework for a visual knowledge base that combines text and image data at object and part levels, exemplified by the creation of AnimalKB.

Findings

01

AnimalKB improves zero-shot recognition accuracy.

02

AnimalKB enhances fine-grained visual question answering.

03

Constructed knowledge base serves as a benchmark for graph completion and segmentation.

Abstract

Understanding objects is fundamental to computer vision. Beyond object recognition that provides only a category label as typical output, in-depth object understanding represents a comprehensive perception of an object category, involving its components, appearance characteristics, inter-category relationships, contextual background knowledge, etc. Developing such capability requires sufficient multi-modal data, including visual annotations such as parts, attributes, and co-occurrences for specific tasks, as well as textual knowledge to support high-level tasks like reasoning and question answering. However, these data are generally task-oriented and not systematically organized enough to achieve the expected understanding of object categories. In response, we propose the Visual Knowledge Base that structures multi-modal object knowledge as graphs, and present a construction framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks