OSIL: Learning Offline Safe Imitation Policies with Safety Inferred from Non-preferred Trajectories

Returaj Burnwal; Nirav Pravinbhai Bhatt; Balaraman Ravindran

arXiv:2602.11018·cs.LG·February 12, 2026

OSIL: Learning Offline Safe Imitation Policies with Safety Inferred from Non-preferred Trajectories

Returaj Burnwal, Nirav Pravinbhai Bhatt, Balaraman Ravindran

PDF

Open Access

TL;DR

OSIL is an offline imitation learning algorithm that infers safety constraints from non-preferred demonstrations to learn safe, reward-maximizing policies without explicit safety annotations.

Contribution

The paper introduces OSIL, a novel offline safe imitation learning method that infers safety from non-preferred trajectories without requiring explicit safety costs.

Findings

01

OSIL learns safer policies satisfying cost constraints.

02

OSIL outperforms baseline methods in safety and reward metrics.

03

The approach effectively infers safety from non-preferred demonstrations.

Abstract

This work addresses the problem of offline safe imitation learning (IL), where the goal is to learn safe and reward-maximizing policies from demonstrations that do not have per-timestep safety cost or reward information. In many real-world domains, online learning in the environment can be risky, and specifying accurate safety costs can be difficult. However, it is often feasible to collect trajectories that reflect undesirable or unsafe behavior, implicitly conveying what the agent should avoid. We refer to these as non-preferred trajectories. We propose a novel offline safe IL algorithm, OSIL, that infers safety from non-preferred demonstrations. We formulate safe policy learning as a Constrained Markov Decision Process (CMDP). Instead of relying on explicit safety cost and reward annotations, OSIL reformulates the CMDP problem by deriving a lower bound on reward maximizing objective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Robot Manipulation and Learning