Swiss-Bench SBP-002: A Frontier Model Comparison on Swiss Legal and Regulatory Tasks
Fatih Uenal

TL;DR
This paper introduces Swiss-Bench SBP-002, a comprehensive benchmark evaluating the performance of frontier language models on Swiss legal and regulatory tasks across multiple languages and domains, revealing significant challenges and performance gaps.
Contribution
It presents the first structured, multilingual benchmark for Swiss regulatory compliance tasks and evaluates ten frontier models, providing a new empirical reference for model capabilities in this domain.
Findings
Top model achieves only 38.2% correct responses
Legal translation and case analysis have higher accuracy (69-72%)
Task difficulty varies significantly across types
Abstract
While recent work has benchmarked large language models on Swiss legal translation (Niklaus et al., 2025) and academic legal reasoning from university exams (Fan et al., 2025), no existing benchmark evaluates frontier model performance on applied Swiss regulatory compliance tasks. I introduce Swiss-Bench SBP-002, a trilingual benchmark of 395 expert-crafted items spanning three Swiss regulatory domains (FINMA, Legal-CH, EFK), seven task types, and three languages (German, French, Italian), and evaluate ten frontier models from March 2026 using a structured three-dimension scoring framework assessed via a blind three-judge LLM panel (GPT-4o, Claude Sonnet 4, Qwen3-235B) with majority-vote aggregation and weighted kappa = 0.605, with reference answers validated by an independent human legal expert on a 100-item subset (73% rated Correct, 0% Incorrect, perfect Legal Accuracy). Results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Artificial Intelligence in Law · Ethics and Social Impacts of AI
