Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference

Zhuo Chen; Xinyu Wang; Yong Jiang; Zhen Zhang; Xinyu Geng; Pengjun Xie; Fei Huang; Kewei Tu

arXiv:2502.18023·cs.CL·August 26, 2025

Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference

Zhuo Chen, Xinyu Wang, Yong Jiang, Zhen Zhang, Xinyu Geng, Pengjun Xie, Fei Huang, Kewei Tu

PDF

Open Access

TL;DR

This paper introduces a method to identify the knowledge boundary of Vision Large Language Models (VLLMs), enabling more efficient use of retrieval techniques like RAG by reducing unnecessary retrievals while maintaining or improving performance.

Contribution

The paper proposes a novel fine-tuning approach with two variants to detect VLLM knowledge boundaries, which can be transferred across different models.

Findings

01

Successfully depicts VLLM knowledge boundaries across datasets.

02

Reduces retrieval dependence while maintaining or improving accuracy.

03

Boundary detection generalizes to other VLLMs.

Abstract

Despite the advancements made in Vision Large Language Models (VLLMs), like text Large Language Models (LLMs), they have limitations in addressing questions that require real-time information or are knowledge-intensive. Indiscriminately adopting Retrieval Augmented Generation (RAG) techniques is an effective yet expensive way to enable models to answer queries beyond their knowledge scopes. To mitigate the dependence on retrieval and simultaneously maintain, or even improve, the performance benefits provided by retrieval, we propose a method to detect the knowledge boundary of VLLMs, allowing for more efficient use of techniques like RAG. Specifically, we propose a method with two variants that fine-tune a VLLM on an automatically constructed dataset for boundary identification. Experimental results on various types of Visual Question Answering datasets show that our method successfully…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling

MethodsAttention Is All You Need · Weight Decay · Dense Connections · Attention Dropout · Linear Layer · Layer Normalization · Byte Pair Encoding · Residual Connection · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay