Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question Answering

Jihao Zhao; Chunlai Zhou; Daixuan Li; Shuaishuai Zu; Biao Qin

arXiv:2505.02311·cs.CL·November 11, 2025

Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question Answering

Jihao Zhao, Chunlai Zhou, Daixuan Li, Shuaishuai Zu, Biao Qin

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces AttenHScore, a real-time hallucination detection metric for small language models in question answering, enabling adaptive invocation of large models to improve accuracy and reduce costs.

Contribution

We propose AttenHScore for dynamic hallucination detection and uncertainty-aware knowledge reorganization, improving real-time invocation of large LMs without extra training.

Findings

01

AttenHScore outperforms baselines in hallucination detection across multiple datasets.

02

Our methods reduce reliance on additional model training.

03

Strategies effectively handle complex queries with improved accuracy.

Abstract

The collaborative paradigm of large and small language models (LMs) effectively balances performance and cost, yet its pivotal challenge lies in precisely pinpointing the moment of invocation when hallucinations arise in small LMs. Previous optimization efforts primarily focused on post-processing techniques, which were separate from the reasoning process of LMs, resulting in high computational costs and limited effectiveness. In this paper, we propose a practical invocation evaluation metric called AttenHScore, which calculates the accumulation and propagation of hallucinations during the generation process of small LMs, continuously amplifying potential reasoning errors. By dynamically adjusting the detection threshold, we achieve more accurate real-time invocation of large LMs. Additionally, considering the limited reasoning capacity of small LMs, we leverage uncertainty-aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

robot2050/attenhscore
pytorchOfficial

Videos

Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question Answering· underline

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques