Policy gradient methods for ordinal policies

Sim\'on Weinberger (ERIC); Jairo Cugliari (ERIC)

arXiv:2506.18614·cs.LG·June 24, 2025

Policy gradient methods for ordinal policies

Sim\'on Weinberger (ERIC), Jairo Cugliari (ERIC)

PDF

TL;DR

This paper introduces a new ordinal policy parametrization for reinforcement learning that better captures action orderings, demonstrating effectiveness in real-world industrial and continuous action tasks.

Contribution

It proposes a novel ordinal regression-based policy parametrization tailored for reinforcement learning, addressing limitations of softmax in ordered action spaces.

Findings

01

Effective in real industrial applications

02

Performs competitively in continuous action tasks

03

Addresses practical challenges in ordinal policy modeling

Abstract

In reinforcement learning, the softmax parametrization is the standard approach for policies over discrete action spaces. However, it fails to capture the order relationship between actions. Motivated by a real-world industrial problem, we propose a novel policy parametrization based on ordinal regression models adapted to the reinforcement learning setting. Our approach addresses practical challenges, and numerical experiments demonstrate its effectiveness in real applications and in continuous action tasks, where discretizing the action space and applying the ordinal policy yields competitive performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.