POP: Prior-Fitted First-Order Optimization Policies
Jan Kobiolka, Christian Frey, Gresa Shala, Arlind Kadra, Erind Bedalli, Josif Grabocka

TL;DR
POP is a meta-learned RL policy that predicts adaptive learning rates for gradient descent, improving optimization performance and generalization across diverse functions without task-specific tuning.
Contribution
We introduce POP, a novel RL-based optimizer with a new reward formulation, function-scaling strategy, and prior sampling method for synthetic problems.
Findings
POP significantly outperforms gradient-based methods on 43 benchmark functions.
POP demonstrates strong generalization capabilities without task-specific tuning.
The method introduces a novel prior and reward formulation for meta-learning optimizers.
Abstract
Gradient-based optimizers are highly sensitive to design choices in their adaptive learning rate mechanisms. To address this limitation, we introduce POP, a meta-learned Reinforcement Learning (RL) policy that predicts adaptive learning rates for gradient descent, conditioned on the contextual information provided in the optimization trajectory. Our method introduces a novel RL reward formulation, a new function-scaling strategy for in-distribution generalization, and a novel prior that is used to sample millions of synthetic optimization problems. We evaluate POP on an established benchmark including 43 optimization functions of various complexity, where it significantly outperforms gradient-based methods. Our evaluation demonstrates strong generalization capabilities without task-specific tuning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
