No Dataset Needed for Downstream Knowledge Benchmarking: Response   Dispersion Inversely Correlates with Accuracy on Domain-specific QA

Robert L Simione II

arXiv:2408.13624·cs.CL·August 27, 2024

No Dataset Needed for Downstream Knowledge Benchmarking: Response Dispersion Inversely Correlates with Accuracy on Domain-specific QA

Robert L Simione II

PDF

Open Access

TL;DR

This paper introduces a novel method to evaluate and compare large language models' knowledge in specific domains using response dispersion, eliminating the need for domain-specific QA datasets and grading.

Contribution

It proposes response dispersion as a new metric for domain-specific knowledge benchmarking, correlates it with accuracy, and validates its effectiveness as a practical alternative.

Findings

01

Response dispersion inversely correlates with QA accuracy (Spearman > -0.59).

02

Response dispersion comparison matches QA accuracy comparison 74-89% of the time.

03

A new local embedding method performs nearly as well as API-based embeddings.

Abstract

This research seeks to obviate the need for creating QA datasets and grading (chatbot) LLM responses when comparing LLMs' knowledge in specific topic domains. This is done in an entirely end-user centric way without need for access to any inner workings of the LLM, so long as it can be prompted and given a random seed to create different generations to the same prompt. The paper does this by, for a given topic domain, defining the "response dispersion" of an LLM by repeatedly asking an LLM the same opinion question about that topic domain. Namely, the response dispersion is the count of singular values needed to explain 95% of the variance in the embedding matrix of the LLM's responses. It is found that the response dispersion is inversely correlated with accuracy on relevant QA evaluations (average spearman rank correlation stronger than -.59). A use-case analysis shows that when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Algorithms · Neural Networks and Applications