Policy gradient methods for ordinal policies
Sim\'on Weinberger (ERIC), Jairo Cugliari (ERIC)

TL;DR
This paper introduces a new ordinal policy parametrization for reinforcement learning that better captures action orderings, demonstrating effectiveness in real-world industrial and continuous action tasks.
Contribution
It proposes a novel ordinal regression-based policy parametrization tailored for reinforcement learning, addressing limitations of softmax in ordered action spaces.
Findings
Effective in real industrial applications
Performs competitively in continuous action tasks
Addresses practical challenges in ordinal policy modeling
Abstract
In reinforcement learning, the softmax parametrization is the standard approach for policies over discrete action spaces. However, it fails to capture the order relationship between actions. Motivated by a real-world industrial problem, we propose a novel policy parametrization based on ordinal regression models adapted to the reinforcement learning setting. Our approach addresses practical challenges, and numerical experiments demonstrate its effectiveness in real applications and in continuous action tasks, where discretizing the action space and applying the ordinal policy yields competitive performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
