Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard   of Safety and Capability

Haonan Li; Xudong Han; Zenan Zhai; Honglin Mu; Hao Wang; Zhenxuan; Zhang; Yilin Geng; Shom Lin; Renxi Wang; Artem Shelmanov; Xiangyu Qi; Yuxia; Wang; Donghai Hong; Youliang Yuan; Meng Chen; Haoqin Tu; Fajri Koto; Tatsuki; Kuribayashi; Cong Zeng; Rishabh Bhardwaj; Bingchen Zhao; Yawen Duan; Yi Liu,; Emad A. Alghamdi; Yaodong Yang; Yinpeng Dong; Soujanya Poria; Pengfei Liu,; Zhengzhong Liu; Xuguang Ren; Eduard Hovy; Iryna Gurevych; Preslav Nakov,; Monojit Choudhury; Timothy Baldwin

arXiv:2412.18551·cs.CL·December 25, 2024

Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability

Haonan Li, Xudong Han, Zenan Zhai, Honglin Mu, Hao Wang, Zhenxuan, Zhang, Yilin Geng, Shom Lin, Renxi Wang, Artem Shelmanov, Xiangyu Qi, Yuxia, Wang, Donghai Hong, Youliang Yuan, Meng Chen, Haoqin Tu, Fajri Koto, Tatsuki, Kuribayashi, Cong Zeng, Rishabh Bhardwaj, Bingchen Zhao

PDF

Open Access 1 Video

TL;DR

Libra-Leaderboard introduces a balanced evaluation framework for LLMs that jointly assesses performance and safety, promoting models that optimize both aspects rather than excelling in one at the expense of the other.

Contribution

It presents a novel balanced ranking method using a distance-to-optimal-score approach and a dynamic leaderboard to encourage responsible AI development.

Findings

01

Evaluated 26 mainstream LLMs revealing safety challenges.

02

The balanced ranking incentivizes models to improve both safety and capability.

03

The framework promotes responsible AI through joint optimization.

Abstract

To address this gap, we introduce Libra-Leaderboard, a comprehensive framework designed to rank LLMs through a balanced evaluation of performance and safety. Combining a dynamic leaderboard with an interactive LLM arena, Libra-Leaderboard encourages the joint optimization of capability and safety. Unlike traditional approaches that average performance and safety metrics, Libra-Leaderboard uses a distance-to-optimal-score method to calculate the overall rankings. This approach incentivizes models to achieve a balance rather than excelling in one dimension at the expense of some other ones. In the first release, Libra-Leaderboard evaluates 26 mainstream LLMs from 14 leading organizations, identifying critical safety challenges even in state-of-the-art models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability· underline

Taxonomy

TopicsEthics and Social Impacts of AI