Stochastic approximation in non-markovian environments revisited
Vivek Shripad Borkar

TL;DR
This paper extends stochastic approximation theory to non-ergodic, non-Markovian environments and applies it to analyze transformer-based learning and continual learning mechanisms that rely on entire past data.
Contribution
It introduces an analytic framework for understanding complex learning models like transformers and continual learning in non-ergodic, non-Markovian settings.
Findings
Framework for analyzing transformer attention mechanisms
Insights into continual learning processes
Enhanced understanding of non-ergodic stochastic approximation
Abstract
Based on some recent work of the author on stochastic approximation in non-markovian environments, the situation when the driving random process is non-ergodic in addition to being non-markovian is considered. Using this, we propose an analytic framework for understanding transformer based learning, specifically, the `attention' mechanism, and continual learning, both of which depend on the entire past in principle.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques · Reinforcement Learning in Robotics
