M^2ConceptBase: A Fine-Grained Aligned Concept-Centric Multimodal Knowledge Base
Zhiwei Zha, Jiaan Wang, Zhixu Li, Xiangru Zhu, Wei Song, Yanghua Xiao

TL;DR
M^2ConceptBase is a comprehensive, concept-centric multimodal knowledge base that aligns detailed visual and textual semantics with concepts, significantly improving multimodal model understanding and performance on tasks like VQA.
Contribution
It introduces the first concept-centric MMKB with a novel context-aware grounding approach, linking concepts with images and descriptions for enhanced multimodal reasoning.
Findings
Over 95% alignment accuracy confirmed by human studies
Significant performance boost on OK-VQA task
Improves concept understanding in multimodal LLMs
Abstract
Multimodal knowledge bases (MMKBs) provide cross-modal aligned knowledge crucial for multimodal tasks. However, the images in existing MMKBs are generally collected for entities in encyclopedia knowledge graphs. Therefore, detailed groundings of visual semantics with linguistic concepts are lacking, which are essential for the visual concept cognition ability of multimodal models. Addressing this gap, we introduce M^2ConceptBase, the first concept-centric MMKB. M^2ConceptBase models concepts as nodes with associated images and detailed textual descriptions. We propose a context-aware multimodal symbol grounding approach to align concept-image and concept-description pairs using context information from image-text datasets. Comprising 951K images and 152K concepts, M^2ConceptBase links each concept to an average of 6.27 images and a single description, ensuring comprehensive visual and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
MethodsALIGN
