Learning by Competition of Self-Interested Reinforcement Learning Agents
Stephen Chung

TL;DR
This paper introduces Weight Maximization, a biologically plausible learning method for neural networks that improves credit assignment and enables training of both continuous and discrete units, demonstrating faster learning than REINFORCE.
Contribution
The paper proposes Weight Maximization, a novel learning rule that replaces global reward signals with local weight change signals, enhancing biological plausibility and training efficiency.
Findings
Weight Maximization approximates reward gradients in expectation.
Networks trained with Weight Maximization learn faster than REINFORCE.
Weight Maximization enables training of discrete-valued units.
Abstract
An artificial neural network can be trained by uniformly broadcasting a reward signal to units that implement a REINFORCE learning rule. Though this presents a biologically plausible alternative to backpropagation in training a network, the high variance associated with it renders it impractical to train deep networks. The high variance arises from the inefficient structural credit assignment since a single reward signal is used to evaluate the collective action of all units. To facilitate structural credit assignment, we propose replacing the reward signal to hidden units with the change in the norm of the unit's outgoing weight. As such, each hidden unit in the network is trying to maximize the norm of its outgoing weight instead of the global reward, and thus we call this learning method Weight Maximization. We prove that Weight Maximization is approximately following the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsREINFORCE
