Reshaping Representation Space to Balance the Safety and Over-rejection in Large Audio Language Models

Hao Yang; Lizhen Qu; Ehsan Shareghi; Gholamreza Haffari

arXiv:2505.19670·cs.CL·May 27, 2025

Reshaping Representation Space to Balance the Safety and Over-rejection in Large Audio Language Models

Hao Yang, Lizhen Qu, Ehsan Shareghi, Gholamreza Haffari

PDF

Open Access 1 Video

TL;DR

This paper introduces an unsupervised safety-fine-tuning method for Large Audio Language Models that improves safety alignment with minimal increase in over-rejection, addressing safety issues without compromising helpfulness.

Contribution

It proposes a novel unsupervised safety-fine-tuning strategy that reshapes the model's representation space to balance safety and over-rejection in LALMs.

Findings

01

Significant safety improvements across three LALM generations.

02

Over-rejection rate increases by only 0.88% on average.

03

Effective safety enhancement under multiple input modalities.

Abstract

Large Audio Language Models (LALMs) have extended the capabilities of Large Language Models (LLMs) by enabling audio-based human interactions. However, recent research has revealed that LALMs remain vulnerable to harmful queries due to insufficient safety-alignment. Despite advances in defence measures for text and vision LLMs, effective safety-alignment strategies and audio-safety dataset specifically targeting LALMs are notably absent. Meanwhile defence measures based on Supervised Fine-tuning (SFT) struggle to address safety improvement while avoiding over-rejection issues, significantly compromising helpfulness. In this work, we propose an unsupervised safety-fine-tuning strategy as remedy that reshapes model's representation space to enhance existing LALMs safety-alignment while balancing the risk of over-rejection. Our experiments, conducted across three generations of Qwen LALMs,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Reshaping Representation Space to Balance the Safety and Over-rejection in Large Audio Language Models· underline

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing