LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content   Moderation of Large Language Models

Hayder Elesedy; Pedro M. Esperan\c{c}a; Silviu Vlad Oprea; Mete Ozay

arXiv:2407.02987·cs.LG·December 19, 2024

LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models

Hayder Elesedy, Pedro M. Esperan\c{c}a, Silviu Vlad Oprea, Mete Ozay

PDF

Open Access

TL;DR

LoRA-Guard is a parameter-efficient method enabling on-device content moderation for large language models by sharing knowledge between models with minimal resource overhead.

Contribution

It introduces a novel low-rank adapter-based approach for guardrail adaptation that significantly reduces parameter overhead while maintaining accuracy.

Findings

01

Outperforms existing methods with 100-1000x fewer parameters

02

Maintains accuracy on content moderation tasks

03

Enables on-device deployment of LLM guardrails

Abstract

Guardrails have emerged as an alternative to safety alignment for content moderation of large language models (LLMs). Existing model-based guardrails have not been designed for resource-constrained computational portable devices, such as mobile phones, more and more of which are running LLM-based applications locally. We introduce LoRA-Guard, a parameter-efficient guardrail adaptation method that relies on knowledge sharing between LLMs and guardrail models. LoRA-Guard extracts language features from the LLMs and adapts them for the content moderation task using low-rank adapters, while a dual-path design prevents any performance degradation on the generative task. We show that LoRA-Guard outperforms existing approaches with 100-1000x lower parameter overhead while maintaining accuracy, enabling on-device content moderation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques