The NordDRG AI Benchmark for Large Language Models

Tapio Pitk\"aranta

arXiv:2506.13790·cs.AI·August 21, 2025

The NordDRG AI Benchmark for Large Language Models

Tapio Pitk\"aranta

PDF

Open Access 1 Repo

TL;DR

This paper introduces NordDRG-AI-Benchmark, a comprehensive open test bed for evaluating large language models' ability to understand and emulate hospital DRG grouping logic, crucial for healthcare funding transparency.

Contribution

It provides the first public, rule-complete benchmark for DRG reasoning, including detailed tables, governance workflows, and exact-match evaluation tasks for LLMs.

Findings

01

GPT-5 achieves perfect scores on logic tasks

02

GPT-5 partially emulates NordDRG grouper logic

03

Benchmark enables reproducible evaluation of LLMs in healthcare funding

Abstract

Large language models (LLMs) are being piloted for clinical coding and decision support, yet no open benchmark targets the hospital-funding layer where Diagnosis-Related Groups (DRGs) determine reimbursement. In most OECD systems, DRGs route a substantial share of multi-trillion-dollar health spending through governed grouper software, making transparency and auditability first-order concerns. We release NordDRG-AI-Benchmark, the first public, rule-complete test bed for DRG reasoning. The package includes (i) machine-readable approximately 20-sheet NordDRG definition tables and (ii) expert manuals and change-log templates that capture governance workflows. It exposes two suites: a 13-task Logic benchmark (code lookup, cross-table inference, grouping features, multilingual terminology, and CC/MCC validity checks) and a 13-task Grouper benchmark that requires full DRG grouper emulation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

longshoreforrest/norddrg-ai-benchmark
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsSparse Evolutionary Training