Interactionless Inverse Reinforcement Learning: A Data-Centric Framework for Durable Alignment

Elias Malomgr\'e; Pieter Simoens

arXiv:2602.14844·cs.LG·March 26, 2026

Interactionless Inverse Reinforcement Learning: A Data-Centric Framework for Durable Alignment

Elias Malomgr\'e, Pieter Simoens

PDF

Open Access

TL;DR

This paper introduces a data-centric framework called Interactionless Inverse Reinforcement Learning for creating durable, editable, and reusable reward artifacts that improve AI alignment by separating reward learning from policy optimization.

Contribution

It proposes a novel framework that decouples reward artifact creation from policy training, enabling better auditability, editability, and reuse, along with a human-in-the-loop lifecycle for continuous improvement.

Findings

01

Framework produces inspectable reward artifacts.

02

Enables iterative auditing and patching of rewards.

03

Transforms alignment from a disposable process to a durable asset.

Abstract

AI alignment is growing in importance, yet many current approaches learn safety behavior by directly modifying policy parameters, entangling normative constraints with the underlying policy. This often yields opaque, difficult-to-edit alignment artifacts and reduces their reuse across models or deployments, a failure mode we term Alignment Waste. We propose Interactionless Inverse Reinforcement Learning, a framework for learning inspectable, editable, and reusable reward artifacts separately from policy optimization. We further introduce the Alignment Flywheel, a human-in-the-loop lifecycle for iteratively auditing, patching, and hardening these artifacts through automated evaluation and refinement. Together, these ideas recast alignment from a disposable training expense into a durable, verifiable engineering asset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI