Llama Guard 3-1B-INT4: Compact and Efficient Safeguard for Human-AI   Conversations

Igor Fedorov; Kate Plawiak; Lemeng Wu; Tarek Elgamal; Naveen Suda,; Eric Smith; Hongyuan Zhan; Jianfeng Chi; Yuriy Hulovatyy; Kimish Patel,; Zechun Liu; Changsheng Zhao; Yangyang Shi; Tijmen Blankevoort; Mahesh; Pasupuleti; Bilge Soran; Zacharie Delpierre Coudert; Rachad Alao; Raghuraman; Krishnamoorthi; Vikas Chandra

arXiv:2411.17713·cs.DC·November 28, 2024

Llama Guard 3-1B-INT4: Compact and Efficient Safeguard for Human-AI Conversations

Igor Fedorov, Kate Plawiak, Lemeng Wu, Tarek Elgamal, Naveen Suda,, Eric Smith, Hongyuan Zhan, Jianfeng Chi, Yuriy Hulovatyy, Kimish Patel,, Zechun Liu, Changsheng Zhao, Yangyang Shi, Tijmen Blankevoort, Mahesh, Pasupuleti, Bilge Soran, Zacharie Delpierre Coudert, Rachad Alao

PDF

Open Access

TL;DR

This paper introduces Llama Guard 3-1B-INT4, a compact, efficient safety moderation model for human-AI conversations that performs well on resource-limited devices, matching larger models in safety while being significantly smaller.

Contribution

The paper presents a new compact safety moderation model, Llama Guard 3-1B-INT4, that is open-sourced and capable of real-time deployment on mobile devices, with comparable safety performance to larger models.

Findings

01

Achieves at least 30 tokens/sec throughput on mobile CPU

02

Time-to-first-token is 2.5 seconds or less on Android devices

03

Attains safety moderation scores comparable or superior to larger models

Abstract

This paper presents Llama Guard 3-1B-INT4, a compact and efficient Llama Guard model, which has been open-sourced to the community during Meta Connect 2024. We demonstrate that Llama Guard 3-1B-INT4 can be deployed on resource-constrained devices, achieving a throughput of at least 30 tokens per second and a time-to-first-token of 2.5 seconds or less on a commodity Android mobile CPU. Notably, our experiments show that Llama Guard 3-1B-INT4 attains comparable or superior safety moderation scores to its larger counterpart, Llama Guard 3-1B, despite being approximately 7 times smaller in size (440MB).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety

MethodsLLaMA