GovRelBench:A Benchmark for Government Domain Relevance
Haiquan Wang, Yi Chen, Shang Zeng, Yun Bian, Zhe Cui

TL;DR
GovRelBench is a new benchmark designed to evaluate large language models' core capabilities in the government domain, focusing on domain relevance, with an innovative scoring method and dedicated tools.
Contribution
The paper introduces GovRelBench, a domain-specific benchmark and evaluation tool, along with SoftGovScore, a novel method for assessing government domain relevance in LLMs.
Findings
GovRelBench effectively measures domain relevance in government-related tasks.
SoftGovScore improves accuracy of relevance scoring.
The benchmark and tools are publicly available for research use.
Abstract
Current evaluations of LLMs in the government domain primarily focus on safety considerations in specific scenarios, while the assessment of the models' own core capabilities, particularly domain relevance, remains insufficient. To address this gap, we propose GovRelBench, a benchmark specifically designed for evaluating the core capabilities of LLMs in the government domain. GovRelBench consists of government domain prompts and a dedicated evaluation tool, GovRelBERT. During the training process of GovRelBERT, we introduce the SoftGovScore method: this method trains a model based on the ModernBERT architecture by converting hard labels to soft scores, enabling it to accurately compute the text's government domain relevance score. This work aims to enhance the capability evaluation framework for large models in the government domain, providing an effective tool for relevant research and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
