From Raw Corpora to Domain Benchmarks: Automated Evaluation of LLM Domain Expertise

Nitin Sharma; Thomas Wolfers; \c{C}a\u{g}atay Y{\i}ld{\i}z

arXiv:2506.07658·cs.CL·March 9, 2026

From Raw Corpora to Domain Benchmarks: Automated Evaluation of LLM Domain Expertise

Nitin Sharma, Thomas Wolfers, \c{C}a\u{g}atay Y{\i}ld{\i}z

PDF

Open Access

TL;DR

This paper introduces an automated, unbiased pipeline for creating domain-specific benchmarks from raw corpora to evaluate LLMs' domain knowledge without relying on other LLMs or human annotation.

Contribution

The authors present a novel deterministic method to generate domain benchmarks directly from raw data, enabling scalable, fair, and up-to-date evaluation of LLMs' domain expertise.

Findings

01

Model performance correlates with expert benchmarks.

02

Benchmark enables analysis of knowledge acquisition.

03

Evaluation framework compares base and chat models.

Abstract

Accurate domain-specific benchmarking of LLMs is essential, specifically in domains with direct implications for humans, such as law, healthcare, and education. However, existing benchmarks are documented to be contaminated and are based on multiple-choice questions, which suffer from inherent biases. To measure domain-specific knowledge in LLMs, we present a deterministic pipeline that transforms raw domain corpora into completion-style benchmarks without relying on other LLMs or costly human annotation. Our approach first extracts domain-specific keywords and related target vocabulary from an input corpus. It then constructs prompt-target pairs where domain-specific words serve as prediction targets. By measuring LLMs' ability to complete these prompts, we provide a direct assessment of domain knowledge at low computational cost. Our pipeline avoids benchmark contamination, enables…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Topic Modeling · Natural Language Processing Techniques