Lever: Inference-Time Policy Reuse under Support Constraints

Ihor Vitenko; Noha Ibrahim; Sihem Amer-Yahia

arXiv:2604.20174·cs.LG·April 29, 2026

Lever: Inference-Time Policy Reuse under Support Constraints

Ihor Vitenko, Noha Ibrahim, Sihem Amer-Yahia

PDF

TL;DR

This paper introduces Lever, a framework for inference-time policy reuse in reinforcement learning, which composes new policies offline from a library of pre-trained policies using behavioral embeddings and Q-value composition.

Contribution

Lever is the first end-to-end method for offline policy composition that balances performance and computational cost in support-limited regimes.

Findings

01

Inference-time composition can match or exceed training-from-scratch performance.

02

Performance depends critically on the coverage of available transitions.

03

Long-horizon dependencies pose fundamental limitations for offline reuse.

Abstract

Reinforcement learning (RL) policies are typically trained for fixed objectives, making reuse difficult when task requirements change. We study inference-time policy reuse: given a library of pre-trained policies and a new composite objective, can a high-quality policy be constructed entirely offline, without additional environment interaction? We introduce lever (Leveraging Efficient Vector Embeddings for Reusable policies), an end-to-end framework that retrieves relevant policies, evaluates them using behavioral embeddings, and composes new policies via offline Q-value composition. We focus on the support-limited regime, where no value propagation is possible, and show that the effectiveness of reuse depends critically on the coverage of available transitions. To balance performance and computational cost, lever proposes composition strategies that control the exploration of candidate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.