HarmLevelBench: Evaluating Harm-Level Compliance and the Impact of   Quantization on Model Alignment

Yannis Belkhiter; Giulio Zizzo; Sergio Maffeis

arXiv:2411.06835·cs.CL·November 12, 2024

HarmLevelBench: Evaluating Harm-Level Compliance and the Impact of Quantization on Model Alignment

Yannis Belkhiter, Giulio Zizzo, Sergio Maffeis

PDF

Open Access

TL;DR

This paper introduces a new harm-level dataset and benchmark for evaluating LLM vulnerabilities, analyzes the impact of quantization on model robustness, and provides insights into safeguarding against harmful outputs.

Contribution

It presents a novel harm-level assessment dataset, benchmarks jailbreaking attacks on Vicuna 13B, and investigates how quantization affects model alignment and robustness.

Findings

01

Quantization techniques influence model robustness and vulnerability.

02

Harmful input queries increase jailbreaking complexity.

03

Trade-offs exist between robustness and vulnerability with quantization.

Abstract

With the introduction of the transformers architecture, LLMs have revolutionized the NLP field with ever more powerful models. Nevertheless, their development came up with several challenges. The exponential growth in computational power and reasoning capabilities of language models has heightened concerns about their security. As models become more powerful, ensuring their safety has become a crucial focus in research. This paper aims to address gaps in the current literature on jailbreaking techniques and the evaluation of LLM vulnerabilities. Our contributions include the creation of a novel dataset designed to assess the harmfulness of model outputs across multiple harm levels, as well as a focus on fine-grained harm-level analysis. Using this framework, we provide a comprehensive benchmark of state-of-the-art jailbreaking attacks, specifically targeting the Vicuna 13B v1.5 model.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHIV, Drug Use, Sexual Risk

MethodsFocus