Interactionless Inverse Reinforcement Learning: A Data-Centric Framework for Durable Alignment
Elias Malomgr\'e, Pieter Simoens

TL;DR
This paper introduces a data-centric framework called Interactionless Inverse Reinforcement Learning for creating durable, editable, and reusable reward artifacts that improve AI alignment by separating reward learning from policy optimization.
Contribution
It proposes a novel framework that decouples reward artifact creation from policy training, enabling better auditability, editability, and reuse, along with a human-in-the-loop lifecycle for continuous improvement.
Findings
Framework produces inspectable reward artifacts.
Enables iterative auditing and patching of rewards.
Transforms alignment from a disposable process to a durable asset.
Abstract
AI alignment is growing in importance, yet many current approaches learn safety behavior by directly modifying policy parameters, entangling normative constraints with the underlying policy. This often yields opaque, difficult-to-edit alignment artifacts and reduces their reuse across models or deployments, a failure mode we term Alignment Waste. We propose Interactionless Inverse Reinforcement Learning, a framework for learning inspectable, editable, and reusable reward artifacts separately from policy optimization. We further introduce the Alignment Flywheel, a human-in-the-loop lifecycle for iteratively auditing, patching, and hardening these artifacts through automated evaluation and refinement. Together, these ideas recast alignment from a disposable training expense into a durable, verifiable engineering asset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI
