A two step algorithm for learning from unspecific reinforcement
Reimer Kuehn, Ion-Olimpiu Stamatescu

TL;DR
This paper introduces a two-step learning algorithm based on the Hebb rule to handle delayed, unspecific reinforcement, demonstrating convergence to perfect generalization under certain conditions despite feedback ambiguity.
Contribution
It proposes a novel two-step algorithm for learning from unspecific reinforcement and analyzes its convergence properties and dependence on parameters.
Findings
Convergence to perfect generalization observed despite unspecific feedback.
Learning rate influences convergence speed and success.
Initial conditions affect whether the system achieves optimal generalization.
Abstract
We study a simple learning model based on the Hebb rule to cope with "delayed", unspecific reinforcement. In spite of the unspecific nature of the information-feedback, convergence to asymptotically perfect generalization is observed, with a rate depending, however, in a non- universal way on learning parameters. Asymptotic convergence can be as fast as that of Hebbian learning, but may be slower. Moreover, for a certain range of parameter settings, it depends on initial conditions whether the system can reach the regime of asymptotically perfect generalization, or rather approaches a stationary state of poor generalization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
