Look It Up: Analysing Internal Web Search Capabilities of Modern LLMs

Sahil Kale

arXiv:2511.18931·cs.CL·November 25, 2025

Look It Up: Analysing Internal Web Search Capabilities of Modern LLMs

Sahil Kale

PDF

Open Access

TL;DR

This paper evaluates how well modern large language models utilize integrated web search to improve factual accuracy, revealing strengths in static knowledge and challenges in dynamic, real-time information retrieval.

Contribution

Introduces a benchmark for assessing the necessity and effectiveness of web search in LLMs, highlighting current limitations and potential improvements.

Findings

01

Web access improves static accuracy for some models

02

Models often invoke search but with low accuracy on dynamic queries

03

Overconfidence and retrieval failures limit effectiveness

Abstract

Modern large language models integrate web search to provide real-time answers, yet it remains unclear whether they are efficiently calibrated to use search when it is actually needed. We introduce a benchmark evaluating both the necessity and effectiveness of web access across commercial models with no access to internal states or parameters. The dataset includes a static split of 783 temporally anchored questions answerable from pre-cutoff knowledge, aimed at testing whether models invoke search based on low internal confidence, and a dynamic split of 288 post-cutoff queries designed to test whether models recognise when search is required and retrieve updated information. Web access substantially improves static accuracy for GPT-5-mini and Claude Haiku 4.5, though confidence calibration worsens. On dynamic queries, both models frequently invoke search yet remain below 70 percent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education