Human-AI Safety: A Descendant of Generative AI and Control Systems   Safety

Andrea Bajcsy; Jaime F. Fisac

arXiv:2405.09794·cs.AI·June 25, 2024·1 cites

Human-AI Safety: A Descendant of Generative AI and Control Systems Safety

Andrea Bajcsy, Jaime F. Fisac

PDF

Open Access

TL;DR

This paper explores the intersection of AI safety and control systems, emphasizing the importance of understanding human-AI feedback loops for ensuring safety in advanced AI systems.

Contribution

It introduces a unifying formalism for modeling dynamic human-AI interactions and outlines a technical roadmap for next-generation human-centered AI safety.

Findings

01

Highlighting the entanglement of AI outputs with human responses over time

02

Identifying open challenges in current AI safety approaches

03

Proposing a formal framework for analyzing human-AI safety interactions

Abstract

Artificial intelligence (AI) is interacting with people at an unprecedented scale, offering new avenues for immense positive impact, but also raising widespread concerns around the potential for individual and societal harm. Today, the predominant paradigm for human--AI safety focuses on fine-tuning the generative model's outputs to better agree with human-provided examples or feedback. In reality, however, the consequences of an AI model's outputs cannot be determined in isolation: they are tightly entangled with the responses and behavior of human users over time. In this paper, we distill key complementary lessons from AI safety and control systems safety, highlighting open challenges as well as key synergies between both fields. We then argue that meaningful safety assurances for advanced AI technologies require reasoning about how the feedback loop formed by AI outputs and human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI