Granite Guardian
Inkit Padhi, Manish Nagireddy, Giandomenico Cornacchia, Subhajit, Chaudhury, Tejaswini Pedapati, Pierre Dognin, Keerthiram Murugesan, Erik, Miehling, Mart\'in Santill\'an Cooper, Kieran Fraser, Giulio Zizzo, Muhammad, Zaid Hameed, Mark Purcell, Michael Desmond, Qian Pan

TL;DR
Granite Guardian models are a comprehensive, open-source suite designed to detect a wide range of risks in large language models, including biases, harmful content, and hallucinations, with high accuracy and broad coverage.
Contribution
This work introduces the Granite Guardian models, a novel set of safeguards trained on diverse data to address overlooked risks like jailbreaks and RAG-specific issues in LLMs.
Findings
Achieved AUC scores of 0.871 and 0.854 on key benchmarks.
Provides broad risk coverage including bias, violence, and hallucinations.
Open-source release promotes responsible AI development.
Abstract
We introduce the Granite Guardian models, a suite of safeguards designed to provide risk detection for prompts and responses, enabling safe and responsible use in combination with any large language model (LLM). These models offer comprehensive coverage across multiple risk dimensions, including social bias, profanity, violence, sexual content, unethical behavior, jailbreaking, and hallucination-related risks such as context relevance, groundedness, and answer relevance for retrieval-augmented generation (RAG). Trained on a unique dataset combining human annotations from diverse sources and synthetic data, Granite Guardian models address risks typically overlooked by traditional risk detection models, such as jailbreaks and RAG-specific issues. With AUC scores of 0.871 and 0.854 on harmful content and RAG-hallucination-related benchmarks respectively, Granite Guardian is the most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗ibm-research/granite-guardian-3.2-3b-a800m-GGUFmodel· 132 dl· ♡ 3132 dl♡ 3
- 🤗ibm-granite/granite-guardian-3.3-8bmodel· 9.1k dl· ♡ 329.1k dl♡ 32
- 🤗ibm-granite/granite-guardian-3.0-8bmodel· 521 dl· ♡ 38521 dl♡ 38
- 🤗ibm-granite/granite-guardian-3.0-2bmodel· 1.5k dl· ♡ 211.5k dl♡ 21
- 🤗ibm-granite/granite-guardian-3.1-2bmodel· 2.3k dl· ♡ 152.3k dl♡ 15
- 🤗ibm-granite/granite-guardian-3.1-8bmodel· 707 dl· ♡ 14707 dl♡ 14
- 🤗ktoprakucar/granite-guardian-3.1-2b-Q8-GGUFmodel· 13 dl· ♡ 213 dl♡ 2
- 🤗ibm-granite/granite-guardian-3.2-5bmodel· 33k dl· ♡ 1233k dl♡ 12
- 🤗cgus/granite-guardian-3.1-2b-exl2model· 2 dl2 dl
- 🤗ibm-granite/granite-guardian-3.2-3b-a800mmodel· 4.2k dl· ♡ 84.2k dl♡ 8
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeotechnical and Geomechanical Engineering
