Revisiting KRISP: A Lightweight Reproduction and Analysis of Knowledge-Enhanced Vision-Language Models
Souradeep Dutta, Keshav Bulia, Neena S Nair

TL;DR
This paper presents a lightweight reproduction of KRISP, a knowledge-enhanced vision-language model, revealing design flaws and demonstrating effective reasoning with fewer parameters suitable for resource-constrained devices.
Contribution
We provide a simplified, resource-efficient version of KRISP, analyze its design flaws, and evaluate its performance and scalability under limited-resource conditions.
Findings
Replicated model achieves 75% of original performance
Identified and addressed design flaws and pitfalls
Demonstrated effective reasoning on synthetic and real datasets
Abstract
Facebook AI Research introduced KRISP [4], which integrates structured external knowledge into pipelines for vision-language reasoning. Despite its effectiveness, the original model has been developed for industrial-scale training, is computationally demanding, and is tightly connected to a large backbone. In this work, we reexamine KRISP from a different angle and offer a lightweight reproduction with significantly fewer parameters. Even though our replicated model performs about 75 % of the original, the replication process uncovers a number of design flaws, real-world pitfalls, and implicit problems that were not fully covered in the original paper. We offer insights into the scalability and efficacy of knowledge-enhanced VQA architectures under resource constraints through systematic ablation studies, which include a proof-of-concept on synthetic VQA data and evaluation on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Advanced Graph Neural Networks
