Revisiting KRISP: A Lightweight Reproduction and Analysis of Knowledge-Enhanced Vision-Language Models

Souradeep Dutta; Keshav Bulia; Neena S Nair

arXiv:2511.20795·cs.CV·November 27, 2025

Revisiting KRISP: A Lightweight Reproduction and Analysis of Knowledge-Enhanced Vision-Language Models

Souradeep Dutta, Keshav Bulia, Neena S Nair

PDF

Open Access

TL;DR

This paper presents a lightweight reproduction of KRISP, a knowledge-enhanced vision-language model, revealing design flaws and demonstrating effective reasoning with fewer parameters suitable for resource-constrained devices.

Contribution

We provide a simplified, resource-efficient version of KRISP, analyze its design flaws, and evaluate its performance and scalability under limited-resource conditions.

Findings

01

Replicated model achieves 75% of original performance

02

Identified and addressed design flaws and pitfalls

03

Demonstrated effective reasoning on synthetic and real datasets

Abstract

Facebook AI Research introduced KRISP [4], which integrates structured external knowledge into pipelines for vision-language reasoning. Despite its effectiveness, the original model has been developed for industrial-scale training, is computationally demanding, and is tightly connected to a large backbone. In this work, we reexamine KRISP from a different angle and offer a lightweight reproduction with significantly fewer parameters. Even though our replicated model performs about 75 % of the original, the replication process uncovers a number of design flaws, real-world pitfalls, and implicit problems that were not fully covered in the original paper. We offer insights into the scalability and efficacy of knowledge-enhanced VQA architectures under resource constraints through systematic ablation studies, which include a proof-of-concept on synthetic VQA data and evaluation on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Advanced Graph Neural Networks