Efficient Reward Identification In Max Entropy Reinforcement Learning with Sparsity and Rank Priors

Mohamad Louai Shehab; Alperen Tercan; Necmiye Ozay

arXiv:2508.07400·cs.LG·August 12, 2025

Efficient Reward Identification In Max Entropy Reinforcement Learning with Sparsity and Rank Priors

Mohamad Louai Shehab, Alperen Tercan, Necmiye Ozay

PDF

Open Access

TL;DR

This paper introduces efficient algorithms for recovering time-varying reward functions in max entropy reinforcement learning by leveraging sparsity and rank priors, improving accuracy and computational feasibility.

Contribution

It formulates reward identification as sparsification and rank minimization problems, providing polynomial-time algorithms with practical applications.

Findings

01

Algorithms accurately recover rewards from demonstrations.

02

Methods outperform baseline approaches in reward recovery.

03

Reconstructed rewards generalize well to new policies.

Abstract

In this paper, we consider the problem of recovering time-varying reward functions from either optimal policies or demonstrations coming from a max entropy reinforcement learning problem. This problem is highly ill-posed without additional assumptions on the underlying rewards. However, in many applications, the rewards are indeed parsimonious, and some prior information is available. We consider two such priors on the rewards: 1) rewards are mostly constant and they change infrequently, 2) rewards can be represented by a linear combination of a small number of feature functions. We first show that the reward identification problem with the former prior can be recast as a sparsification problem subject to linear constraints. Moreover, we give a polynomial-time algorithm that solves this sparsification problem exactly. Then, we show that identifying rewards representable with the minimum…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms