TL;DR
This paper introduces a scalable, model-free primal-dual policy gradient method for optimal resource allocation in wireless systems, leveraging action-space exploration and zeroth-order gradient approximations.
Contribution
It presents a novel, scalable approach that efficiently learns optimal resource policies without relying on system models or channel statistics.
Findings
Achieves near-optimal performance in simulations.
Outperforms existing methods in scalability and efficiency.
Demonstrates theoretical convergence and practical applicability.
Abstract
Wireless systems resource allocation refers to perpetual and challenging nonconvex constrained optimization tasks, which are especially timely in modern communications and networking setups involving multiple users with heterogeneous objectives and imprecise or even unknown models and/or channel statistics. In this paper, we propose a technically grounded and scalable primal-dual deterministic policy gradient method for efficiently learning optimal parameterized resource allocation policies. Our method not only efficiently exploits gradient availability of popular universal policy representations, such as deep neural networks, but is also truly model-free, as it relies on consistent zeroth-order gradient approximations of the associated random network services constructed via low-dimensional perturbations in action space, thus fully bypassing any dependence on critics. Both theory and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
