Offline RL for Adaptive Policy Retrieval in Prior Authorization
Ruslan Sharifullin, Maxim Gorshkov, Hannah Clay

TL;DR
This paper models policy retrieval in prior authorization as an offline reinforcement learning problem, developing adaptive retrieval strategies that balance accuracy and efficiency, outperforming fixed strategies.
Contribution
It introduces an offline RL framework with various algorithms for adaptive policy retrieval, demonstrating improved decision accuracy and retrieval efficiency over fixed methods.
Findings
CQL achieves 92% decision accuracy with exhaustive retrieval.
IQL matches accuracy with 44% fewer retrieval steps.
DPO matches CQL's accuracy while reducing retrieval steps by 47%.
Abstract
Prior authorization (PA) requires interpretation of complex and fragmented coverage policies, yet existing retrieval-augmented systems rely on static top- strategies with fixed numbers of retrieved sections. Such fixed retrieval can be inefficient and gather irrelevant or insufficient information. We model policy retrieval for PA as a sequential decision-making problem, formulating adaptive retrieval as a Markov Decision Process (MDP). In our system, an agent iteratively selects policy chunks from a top- candidate set or chooses to stop and issue a decision. The reward balances decision correctness against retrieval cost, capturing the trade-off between accuracy and efficiency. We train policies using Conservative Q-Learning (CQL), Implicit Q-Learning (IQL), and Direct Preference Optimization (DPO) in an offline RL setting on logged trajectories generated from baseline retrieval…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
