Dynamic Knowledge Integration for Enhanced Vision-Language Reasoning

Julian Perry; Surasakdi Siripong; Thanakorn Phonchai

arXiv:2501.08597·cs.CL·January 16, 2025

Dynamic Knowledge Integration for Enhanced Vision-Language Reasoning

Julian Perry, Surasakdi Siripong, Thanakorn Phonchai

PDF

Open Access

TL;DR

This paper introduces AKGP-LVLM, a novel method that dynamically integrates external knowledge into large vision-language models, significantly improving their performance on knowledge-intensive multimodal tasks.

Contribution

The paper presents a new adaptive pretraining approach that incorporates structured and unstructured knowledge into LVLMs during training and fine-tuning.

Findings

01

Achieved significant performance gains on four benchmark datasets.

02

Human evaluations show improved correctness and relevance.

03

Demonstrated robustness, efficiency, and scalability of the method.

Abstract

Large Vision-Language Models (LVLMs) have demonstrated impressive capabilities in multimodal tasks, but their performance is often constrained by the lack of external knowledge integration, limiting their ability to handle knowledge-intensive tasks such as visual question answering and reasoning. To address this challenge, we propose a novel method, Adaptive Knowledge-Guided Pretraining for Large Vision-Language Models (AKGP-LVLM), which dynamically incorporates structured and unstructured knowledge into LVLMs during pretraining and fine-tuning. Our approach employs a knowledge encoder to represent external knowledge, a retrieval mechanism to select task-relevant information, and a dynamic adaptor to align multimodal and knowledge representations effectively. We evaluate our method on four benchmark datasets, demonstrating significant performance improvements over state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConstraint Satisfaction and Optimization

MethodsALIGN