Global Challenge for Safe and Secure LLMs Track 1

Xiaojun Jia; Yihao Huang; Yang Liu; Peng Yan Tan; Weng Kuan Yau,; Mun-Thye Mak; Xin Ming Sim; Wee Siong Ng; See Kiong Ng; Hanqing Liu; Lifeng; Zhou; Huanqian Yan; Xiaobing Sun; Wei Liu; Long Wang; Yiming Qian; Yong Liu,; Junxiao Yang; Zhexin Zhang; Leqi Lei; Renmiao Chen; Yida Lu; Shiyao Cui,; Zizhou Wang; Shaohua Li; Yan Wang; Rick Siow Mong Goh; Liangli Zhen; Yingjie; Zhang; Zhe Zhao

arXiv:2411.14502·cs.CR·November 25, 2024

Global Challenge for Safe and Secure LLMs Track 1

Xiaojun Jia, Yihao Huang, Yang Liu, Peng Yan Tan, Weng Kuan Yau,, Mun-Thye Mak, Xin Ming Sim, Wee Siong Ng, See Kiong Ng, Hanqing Liu, Lifeng, Zhou, Huanqian Yan, Xiaobing Sun, Wei Liu, Long Wang, Yiming Qian, Yong Liu,, Junxiao Yang, Zhexin Zhang, Leqi Lei, Renmiao Chen

PDF

Open Access

TL;DR

This paper presents a global challenge initiative aimed at developing automated methods to identify and improve the robustness of large language models against jailbreaking and adversarial attacks, crucial for safe deployment.

Contribution

It introduces a new competition framework focused on probing LLM vulnerabilities, fostering advancements in defense mechanisms against malicious exploitation.

Findings

01

Development of automated probing techniques for LLM vulnerabilities

02

Insights into common safety protocol bypass methods

03

Enhanced understanding of LLM robustness challenges

Abstract

This paper introduces the Global Challenge for Safe and Secure Large Language Models (LLMs), a pioneering initiative organized by AI Singapore (AISG) and the CyberSG R&D Programme Office (CRPO) to foster the development of advanced defense mechanisms against automated jailbreaking attacks. With the increasing integration of LLMs in critical sectors such as healthcare, finance, and public administration, ensuring these models are resilient to adversarial attacks is vital for preventing misuse and upholding ethical standards. This competition focused on two distinct tracks designed to evaluate and enhance the robustness of LLM security frameworks. Track 1 tasked participants with developing automated methods to probe LLM vulnerabilities by eliciting undesirable responses, effectively testing the limits of existing safety protocols within LLMs. Participants were challenged to devise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLaw, AI, and Intellectual Property