COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act
Philipp Guldimann, Alexander Spiridonov, Robin Staab, Nikola, Jovanovi\'c, Mark Vero, Velko Vechev, Anna-Maria Gueorguieva, Mislav, Balunovi\'c, Nikola Konstantinov, Pavol Bielik, Petar Tsankov, Martin Vechev

TL;DR
This paper introduces COMPL-AI, a framework translating the EU AI Act into measurable technical requirements and provides an open-source benchmarking suite for LLMs, revealing gaps in current models and emphasizing the need for regulation-aligned evaluation.
Contribution
It offers the first technical interpretation of the EU AI Act focused on LLMs and develops a benchmarking suite to assess compliance and performance.
Findings
12 LLMs evaluated revealing robustness and fairness gaps
Highlighting shortcomings in existing benchmarks
Encouraging development of regulation-aligned benchmarks
Abstract
The EU's Artificial Intelligence Act (AI Act) is a significant step towards responsible AI development, but lacks clear technical interpretation, making it difficult to assess models' compliance. This work presents COMPL-AI, a comprehensive framework consisting of (i) the first technical interpretation of the EU AI Act, translating its broad regulatory requirements into measurable technical requirements, with the focus on large language models (LLMs), and (ii) an open-source Act-centered benchmarking suite, based on thorough surveying and implementation of state-of-the-art LLM benchmarks. By evaluating 12 prominent LLMs in the context of COMPL-AI, we reveal shortcomings in existing models and benchmarks, particularly in areas like robustness, safety, diversity, and fairness. This work highlights the need for a shift in focus towards these aspects, encouraging balanced development of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLaw, AI, and Intellectual Property
MethodsFocus
