Robotics-Inspired Guardrails for Foundation Models in Socially Sensitive Domains
Rebecca Ramnauth, Drazen Brscic, and Brian Scassellati

TL;DR
This paper introduces a robotics-inspired framework for enforcing behavioral safety in foundation models during interactions in sensitive social domains, aiming for stronger guarantees than existing methods.
Contribution
It reframes guardrails as runtime behavioral control using formal constraints, applied through the Grounded Observer framework in real-world social settings.
Findings
Framework enables real-time interventions to prevent undesirable behaviors.
Applied successfully in autism therapy, small talk, and behavioral de-escalation.
Mitigates drift into unsafe interaction regimes while adapting to context.
Abstract
Foundation models are increasingly deployed in socially sensitive domains such as education, mental health, and caregiving, where failures are often cumulative and context-dependent. Existing guardrail approaches -- ranging from training-time alignment to prompting, decoding constraints, and post-hoc moderation -- primarily provide empirical risk reduction rather than enforceable behavioral guarantees, and largely treat safety as a property of individual outputs rather than interaction trajectories. We reframe guardrails as a problem of runtime behavioral control over interaction trajectories, drawing on robotics to introduce formal constructs for constraint enforcement in uncertain, closed-loop systems. We instantiate these ideas in the Grounded Observer framework and apply it across three real-world deployments: small talk, in-home autism therapy, and behavioral de-escalation in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
