Upstream and Downstream AI Safety: Both on the Same River?

John McDermid; Yan Jia; Ibrahim Habli

arXiv:2501.05455·cs.CY·January 13, 2025

Upstream and Downstream AI Safety: Both on the Same River?

John McDermid, Yan Jia, Ibrahim Habli

PDF

Open Access

TL;DR

This paper explores the relationship between upstream safety, focusing on AI capabilities and risks beyond specific applications, and downstream safety, which assesses safety within operational contexts, proposing potential synergies for improved AI safety strategies.

Contribution

It analyzes the characteristics of upstream and downstream safety frameworks and discusses how integrating insights from both can enhance AI safety assessments and interventions.

Findings

01

Upstream safety addresses AI capabilities and risks beyond specific applications.

02

Downstream safety focuses on safety within operational contexts.

03

Potential synergies can improve overall AI safety strategies.

Abstract

Traditional safety engineering assesses systems in their context of use, e.g. the operational design domain (road layout, speed limits, weather, etc.) for self-driving vehicles (including those using AI). We refer to this as downstream safety. In contrast, work on safety of frontier AI, e.g. large language models which can be further trained for downstream tasks, typically considers factors that are beyond specific application contexts, such as the ability of the model to evade human control, or to produce harmful content, e.g. how to make bombs. We refer to this as upstream safety. We outline the characteristics of both upstream and downstream safety frameworks then explore the extent to which the broad AI safety community can benefit from synergies between these frameworks. For example, can concepts such as common mode failures from downstream safety be used to help assess the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Artificial Intelligence in Healthcare and Education

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings