PRISM: A Design Framework for Open-Source Foundation Model Safety

Terrence Neumann; Bryan Jones

arXiv:2406.10415·cs.CY·June 18, 2024

PRISM: A Design Framework for Open-Source Foundation Model Safety

Terrence Neumann, Bryan Jones

PDF

Open Access

TL;DR

PRISM is a modular safety framework for open-source foundation models that enhances safety through independent moderation functions, community engagement, and minimal additional computational costs.

Contribution

The paper introduces PRISM, a novel safety framework for open-source models emphasizing modular, independent safety measures over brittle reinforcement learning approaches.

Findings

01

PRISM effectively identifies AUP violations.

02

Modular safety functions improve adaptability and resilience.

03

Community engagement fosters consensus on safety standards.

Abstract

The rapid advancement of open-source foundation models has brought transparency and accessibility to this groundbreaking technology. However, this openness has also enabled the development of highly-capable, unsafe models, as exemplified by recent instances such as WormGPT and FraudGPT, which are specifically designed to facilitate criminal activity. As the capabilities of open foundation models continue to grow, potentially outpacing those of closed-source models, the risk of misuse by bad actors poses an increasingly serious threat to society. This paper addresses the critical question of how open foundation model developers should approach model safety in light of these challenges. Our analysis reveals that open-source foundation model companies often provide less restrictive acceptable use policies (AUPs) compared to their closed-source counterparts, likely due to the inherent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTransportation Safety and Impact Analysis · Tunneling and Rock Mechanics