PRISM: A Design Framework for Open-Source Foundation Model Safety
Terrence Neumann, Bryan Jones

TL;DR
PRISM is a modular safety framework for open-source foundation models that enhances safety through independent moderation functions, community engagement, and minimal additional computational costs.
Contribution
The paper introduces PRISM, a novel safety framework for open-source models emphasizing modular, independent safety measures over brittle reinforcement learning approaches.
Findings
PRISM effectively identifies AUP violations.
Modular safety functions improve adaptability and resilience.
Community engagement fosters consensus on safety standards.
Abstract
The rapid advancement of open-source foundation models has brought transparency and accessibility to this groundbreaking technology. However, this openness has also enabled the development of highly-capable, unsafe models, as exemplified by recent instances such as WormGPT and FraudGPT, which are specifically designed to facilitate criminal activity. As the capabilities of open foundation models continue to grow, potentially outpacing those of closed-source models, the risk of misuse by bad actors poses an increasingly serious threat to society. This paper addresses the critical question of how open foundation model developers should approach model safety in light of these challenges. Our analysis reveals that open-source foundation model companies often provide less restrictive acceptable use policies (AUPs) compared to their closed-source counterparts, likely due to the inherent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTransportation Safety and Impact Analysis · Tunneling and Rock Mechanics
