Don't Measure Once: Measuring Visibility in AI Search (GEO)
Julius Schulte, Malte Bleeker, Philipp Kaufmann

TL;DR
This paper highlights the importance of repeated measurements in AI search visibility assessment due to the probabilistic and variable nature of generative engine outputs, contrasting with classical search engine stability.
Contribution
It introduces the concept of viewing visibility as a distribution rather than a single point, emphasizing empirical evidence for repeated measurements in AI search evaluation.
Findings
Single measurements are unreliable due to variability in AI search results.
Repeated measurements reveal the distribution of visibility, providing a more accurate assessment.
Variability depends on prompts, time, and model randomness.
Abstract
As large language model-based chat systems become increasingly widely used, generative engine optimization (GEO) has emerged as an important problem for information access and retrieval. In classical search engines, results are comparatively transparent and stable: a single query often provides a representative snapshot of where a page or brand appears relative to competitors. The inherent probabilistic nature of AI search changes this paradigm. Answers can vary across runs, prompts, and time, making one-off observations unreliable. Drawing on empirical studies, our findings underscore the need for repeated measurements to assess a brand's GEO performance and to characterize visibility as a distribution rather than a single-point outcome.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
