Customize Multi-modal RAI Guardrails with Precedent-based predictions

Cheng-Fu Yang; Thanh Tran; Christos Christodoulopoulos; Weitong Ruan; Rahul Gupta; Kai-Wei Chang

arXiv:2507.20503·cs.LG·July 29, 2025

Customize Multi-modal RAI Guardrails with Precedent-based predictions

Cheng-Fu Yang, Thanh Tran, Christos Christodoulopoulos, Weitong Ruan, Rahul Gupta, Kai-Wei Chang

PDF

TL;DR

This paper introduces a flexible multi-modal guardrail system that uses precedent-based reasoning to filter image content according to user policies, overcoming limitations of existing methods in adaptability and scalability.

Contribution

The paper proposes a novel precedent-based approach with a critique-revise mechanism for scalable, adaptable content filtering in multi-modal guardrails.

Findings

01

Outperforms previous methods in few-shot and full-dataset scenarios

02

Shows superior generalization to new policies

03

Effective in real-world customizable content filtering

Abstract

A multi-modal guardrail must effectively filter image content based on user-defined policies, identifying material that may be hateful, reinforce harmful stereotypes, contain explicit material, or spread misinformation. Deploying such guardrails in real-world applications, however, poses significant challenges. Users often require varied and highly customizable policies and typically cannot provide abundant examples for each custom policy. Consequently, an ideal guardrail should be scalable to the multiple policies and adaptable to evolving user standards with minimal retraining. Existing fine-tuning methods typically condition predictions on pre-defined policies, restricting their generalizability to new policies or necessitating extensive retraining to adapt. Conversely, training-free methods struggle with limited context lengths, making it difficult to incorporate all the policies…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.