GovRelBench:A Benchmark for Government Domain Relevance

Haiquan Wang; Yi Chen; Shang Zeng; Yun Bian; Zhe Cui

arXiv:2507.21419·cs.AI·July 30, 2025

GovRelBench:A Benchmark for Government Domain Relevance

Haiquan Wang, Yi Chen, Shang Zeng, Yun Bian, Zhe Cui

PDF

TL;DR

GovRelBench is a new benchmark designed to evaluate large language models' core capabilities in the government domain, focusing on domain relevance, with an innovative scoring method and dedicated tools.

Contribution

The paper introduces GovRelBench, a domain-specific benchmark and evaluation tool, along with SoftGovScore, a novel method for assessing government domain relevance in LLMs.

Findings

01

GovRelBench effectively measures domain relevance in government-related tasks.

02

SoftGovScore improves accuracy of relevance scoring.

03

The benchmark and tools are publicly available for research use.

Abstract

Current evaluations of LLMs in the government domain primarily focus on safety considerations in specific scenarios, while the assessment of the models' own core capabilities, particularly domain relevance, remains insufficient. To address this gap, we propose GovRelBench, a benchmark specifically designed for evaluating the core capabilities of LLMs in the government domain. GovRelBench consists of government domain prompts and a dedicated evaluation tool, GovRelBERT. During the training process of GovRelBERT, we introduce the SoftGovScore method: this method trains a model based on the ModernBERT architecture by converting hard labels to soft scores, enabling it to accurately compute the text's government domain relevance score. This work aims to enhance the capability evaluation framework for large models in the government domain, providing an effective tool for relevant research and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.