Swiss-Bench SBP-002: A Frontier Model Comparison on Swiss Legal and Regulatory Tasks

Fatih Uenal

arXiv:2603.23646·cs.CL·March 26, 2026

Swiss-Bench SBP-002: A Frontier Model Comparison on Swiss Legal and Regulatory Tasks

Fatih Uenal

PDF

Open Access

TL;DR

This paper introduces Swiss-Bench SBP-002, a comprehensive benchmark evaluating the performance of frontier language models on Swiss legal and regulatory tasks across multiple languages and domains, revealing significant challenges and performance gaps.

Contribution

It presents the first structured, multilingual benchmark for Swiss regulatory compliance tasks and evaluates ten frontier models, providing a new empirical reference for model capabilities in this domain.

Findings

01

Top model achieves only 38.2% correct responses

02

Legal translation and case analysis have higher accuracy (69-72%)

03

Task difficulty varies significantly across types

Abstract

While recent work has benchmarked large language models on Swiss legal translation (Niklaus et al., 2025) and academic legal reasoning from university exams (Fan et al., 2025), no existing benchmark evaluates frontier model performance on applied Swiss regulatory compliance tasks. I introduce Swiss-Bench SBP-002, a trilingual benchmark of 395 expert-crafted items spanning three Swiss regulatory domains (FINMA, Legal-CH, EFK), seven task types, and three languages (German, French, Italian), and evaluate ten frontier models from March 2026 using a structured three-dimension scoring framework assessed via a blind three-judge LLM panel (GPT-4o, Claude Sonnet 4, Qwen3-235B) with majority-vote aggregation and weighted kappa = 0.605, with reference answers validated by an independent human legal expert on a 100-item subset (73% rated Correct, 0% Incorrect, perfect Legal Accuracy). Results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods · Artificial Intelligence in Law · Ethics and Social Impacts of AI