Loading paper

Dynamical learning of dynamics | Tomesphere

arXiv:1902.02875·q-bio.NC·August 26, 2020

Dynamical learning of dynamics

Christian Klos, Yaroslav Felipe Kalle Kossio, Sven Goedeke, Aditya, Gilra, and Raoul-Martin Memmesheimer

TL;DR

This paper demonstrates that fixed-weight neural networks can rapidly learn and generate new dynamics through imitation after pretraining, enabling quick adaptation to various complex tasks without ongoing feedback.

Contribution

It introduces a method where fixed-weight networks learn new dynamics through imitation, showing rapid adaptation after pretraining, which differs from traditional slow synaptic learning.

Findings

01

Networks can learn diverse dynamics like oscillations and chaos.

02

Pretrained fixed-weight networks adapt quickly to new tasks.

03

Networks maintain learned dynamics without further feedback.

Abstract

The ability of humans and animals to quickly adapt to novel tasks is difficult to reconcile with the standard paradigm of learning by slow synaptic weight modification. Here we show that fixed-weight neural networks can learn to generate required dynamics by imitation. After appropriate weight pretraining, the networks quickly and dynamically adapt to learn new tasks and thereafter continue to achieve them without further teacher feedback. We explain this ability and illustrate it with a variety of target dynamics, ranging from oscillatory trajectories to driven and chaotic dynamical systems.

Equations2

\begin{array}[]{rl}\tau\dot{x}(t)&\!\!=-x(t)+Ar(t)+w_{z}z(t)+w_{c}c(t)\\ &\!\phantom{=}+w_{\varepsilon}\varepsilon(t)+w_{u}u(t),\\ z(t)&\!\!=o_{z}r(t),\,c(t)=\text{$o_{c}$}r(t),\end{array}

\begin{array}[]{rl}\tau\dot{x}(t)&\!\!=-x(t)+Ar(t)+w_{z}z(t)+w_{c}c(t)\\ &\!\phantom{=}+w_{\varepsilon}\varepsilon(t)+w_{u}u(t),\\ z(t)&\!\!=o_{z}r(t),\,c(t)=\text{$o_{c}$}r(t),\end{array}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chklos/dynamical-learning
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Dynamical learning of dynamics

Christian Klos

Neural Network Dynamics and Computation, Institute of Genetics, University of Bonn, Bonn, Germany.

Yaroslav Felipe Kalle Kossio

Neural Network Dynamics and Computation, Institute of Genetics, University of Bonn, Bonn, Germany.

Sven Goedeke

Neural Network Dynamics and Computation, Institute of Genetics, University of Bonn, Bonn, Germany.

Aditya Gilra

Neural Network Dynamics and Computation, Institute of Genetics, University of Bonn, Bonn, Germany.

Department of Computer Science, and Neuroscience Institute, University of Sheffield, Sheffield, UK.

Raoul-Martin Memmesheimer

Neural Network Dynamics and Computation, Institute of Genetics, University of Bonn, Bonn, Germany.

Abstract

The ability of humans and animals to quickly adapt to novel tasks is difficult to reconcile with the standard paradigm of learning by slow synaptic weight modification. Here we show that fixed-weight neural networks can learn to generate required dynamics by imitation. After appropriate weight pretraining, the networks quickly and dynamically adapt to learn new tasks and thereafter continue to achieve them without further teacher feedback. We explain this ability and illustrate it with a variety of target dynamics, ranging from oscillatory trajectories to driven and chaotic dynamical systems.

Neural networks, dynamical learning, dynamical systems

Introduction. Humans and animals can learn wide varieties of tasks. The predominant paradigm assumes that their neural networks achieve this by slow adaptation of connection weights between neurons (Dayan and Abbott, 2001; Gerstner et al., 2014). Neurobiological experiments, however, also indicate fast learning with static weights (Perich et al., 2018). Our study addresses how neural networks may quickly learn to generate required output dynamics without weight-learning.

The goal of neural network learning is ultimately to appropriately change the activity of the output neurons of the network. In supervised learning, it should match a target and continue doing so during subsequent testing; in reinforcement learning, it should maximize a sparsely given reward. In our study, the networks adapt their weights during a pretraining phase (Thrun and Pratt, 1998; Vanschoren, 2018; Lansdell and Kording, 2018) such that thereafter with static weights they achieve supervised learning of desired outputs, by adapting only their dynamics (dynamical learning). Adapting the static network’s weights during pretraining is thus a kind of meta-learning or learning-to-learn. There is a recent spurt of interest in learning-to-learn (Vanschoren, 2018; Lansdell and Kording, 2018), focusing mainly on learning of reinforcement learning (Duan et al., 2016; Wang et al., 2018; Nagabandi et al., 2018). Studies on learning of supervised dynamical learning showed prediction of a time series at the current time step given the preceding step’s target (Feldkamp et al., 1996, 1997; Younger et al., 1999; Hochreiter et al., 2001; Younger et al., 2001; Feldkamp et al., 2003; Santiago, 2004; Lukosevicius, 2007; Santoro et al., 2016; Bellec et al., 2018) and control of a system along a time-varying target (Feldkamp and Puskorius, 1994a, b, 1997; Oubbati and Palm, 2010). The studies assume that a target is present during testing to avoid unlearning. This limits applicability and renders the dynamics necessarily non-autonomous; it is conceptually problematic for supervised settings and at odds with the common concept of teacher-free recall.

We therefore develop a scheme for fast supervised dynamical learning and subsequent teacher-free generation of long-term dynamics. We consider models for biological recurrent (reciprocally connected) neural networks, where leaky rate neurons interact in continuous time (Dayan and Abbott, 2001; Gerstner et al., 2014). Such models are amenable to learning, computation and phase space analysis (Dayan and Abbott, 2001; Gerstner et al., 2014; Jaeger et al., 2007; Sussillo and Abbott, 2009; Sussillo and Barak, 2013). After appropriate pretraining using the reservoir computing scheme (where only the weights to output neurons are trained (Sup, ; Jaeger and Haas, 2004; Maass et al., 2002; Sussillo and Abbott, 2009)), all weights are fixed. The networks can nevertheless learn to generate new, desired dynamics. Furthermore, they continue to generate them in self-contained manner during subsequent testing. We illustrate this with a variety of trajectories and dynamical systems and analyze the underlying mechanisms.

*Network model. *We use recurrent neural networks, where each neuron (or neuronal subpopulation) $i,\,i=1,...,N$ , with $N$ between 500 and 3000 depending on the task (Sup, ), is characterized by an activation variable $x_{i}(t)$ and communicates with other neurons via its firing rate $r_{i}(t)$ , a nonlinear function of $x_{i}(t)$ (Dayan and Abbott, 2001; Gerstner et al., 2014). In isolation $x_{i}(t)$ decays to zero with a time constant $\tau_{i}$ . This combines the decay times of membrane potential and synaptic currents. The network has two outputs, which can be interpreted as linear neurons:

signal $z_{k}(t),\,k=1,...,N_{z}$ , and context $c_{l}(t),\,l=1,...,N_{c}$ (Fig. 1). After learning, $z(t)$ generates the desired dynamics while $c(t)$ indexes it. They are continually fed back to the network, allowing their autonomous generation (Jaeger and Haas, 2004). The networks are temporarily also informed about their signal’s difference from its target $\tilde{z}(t)$ by an error input $\varepsilon(t)=z(t)-\tilde{z}(t)$ . Taken together, for constant weights the network dynamics are given by

[TABLE]

with recurrent weights $A$ , the diagonal matrix of time constants $\tau$ , signal and context output weights $o_{z},\,o_{c}$ , feedback weights $w_{z},\,w_{c}$ , input weights $w_{\varepsilon},\,w_{u}$ and a drive $u(t)$ absent for most tasks. We choose $r_{i}(t)=\tanh(x_{i}(t)+b_{i})$ (Jaeger and Haas, 2004; Sussillo and Abbott, 2009; Lukosevicius et al., 2012); offsets $b_{i}$ are drawn from a uniform distribution between $-0.2$ and $0.2$ and break the $x\rightarrow-x$ symmetry without input. Unless mentioned otherwise, we set $\tau_{i}=1$ fixing the overall time scale. Recurrent weights $A_{ij}$ are set to zero with probability $1-p$ ( $p=0.1\text{ or }p=0.2$ depending on the task). Nonzero weights are drawn from a Gaussian distribution with mean [math] and variance $\frac{g^{2}}{pN}$ , where $g=1.5$ (Sussillo and Abbott, 2009). $w_{z,ij},$ $w_{c,ij}$ and $w_{\varepsilon,ij},$ $w_{u,ij}$ are drawn from a uniform distribution between $-\tilde{w}$ and $\tilde{w}$ ( $\tilde{w}=1\text{ or }\tilde{w}=2$ ).

Pretraining. The aim of our pretraining (Fig. 1a) is twofold. First, it should enable the resulting static networks to learn signals of a specific class given only the error input $\varepsilon(t)$ . Second, after removing the error input the static networks should be able to continue to generate the desired dynamics. Therefore, the networks have to learn to minimize $\varepsilon(t)$ and, as explained in the Analysis section, to associate unique contexts with the different target dynamics.

To achieve this, we present different trajectories $\tilde{z}(t)$ of the target class to the networks, together with associated, straightforwardly chosen constant indices $\tilde{c}$ . The output weights $o_{z,ij}$ and $o_{c,ij}$ learn online according to the FORCE rule (Sup, ; Sussillo and Abbott, 2009) to minimize the output errors $\varepsilon(t)$ and $c(t)-\tilde{c}$ . In short, they are modified using the supervised recursive least-squares algorithm with high learning rate. This provides a least-squares optimal regularized solution for the output weights given the past network states and targets (Ismail and Principe, 1996). Signals and indices are presented for a time $t_{\text{wlearn}}$ (30000 or 50000) as a continuous, randomly repeating sequence of training periods of duration $t_{\text{stay}}$ (between 200 and 1000). During each training period’s first part, a network receives $\varepsilon(t)$ as input. Because of the various last states of the previous learning periods, it thus learns to approach $\tilde{z}(t)$ from a broad range of initial conditions given this input. In most of our tasks, after a time $t_{\text{fb}}=100$ , when $z(t)$ is close to $\tilde{z}(t)$ , $\varepsilon(t)$ is switched off and $c(t)$ is fixed to its constant target, matching the testing paradigm. This often helps the network to learn generation of $z(t)\approx\tilde{z}(t)$ without error input.

Dynamical learning and testing. The weights now remain static and the error input teaches the network new tasks of the pretrained target class (Fig. 1b), i.e. the networks dynamically learn to generate $z(t)\approx\tilde{z}(t)$ for previously unseen $\tilde{z}(t)$ . The learning time $t_{\text{learn}}$ (between 50 and 200) is short, a few characteristic timescales of the target dynamics (Sup, ). $c(t)$ is moderately fluctuating.

Thereafter the test phase begins, where no more teacher is present ( $w_{\varepsilon}\rightarrow 0$ ). In weight-learning paradigms, during such phases the weights are fixed (Maass et al., 2002; Jaeger and Haas, 2004; Sussillo and Abbott, 2009; Mante et al., 2013; Hennequin et al., 2014). We likewise fix $c(t)$ to a temporally constant value, an average of previously assumed ones, $c(t)=\bar{c}$ . This may be interpreted as indicating that the context is unchanged and the same signal is still desired. We find in our applications, that the network dynamics continue to generate a close-to-desired signal $z(t)$ , establishing the successful dynamical learning of the task.

Applications. We illustrate our approach by learning a variety of trajectories (tasks (i-iv)) and dynamical systems (tasks (v,vi)). First, we consider a family $\tilde{z}(t;k)$ of target trajectories, parameterized by $k$ . The networks are pretrained on a few of them, where the context target $\tilde{c}$ is a linear function of $k$ . Thereafter the networks dynamically learn to generate a previously unseen trajectory as output and perpetuate it during testing. We start with the simple, instructive target family of oscillations with different periods (task (i)): $\tilde{z}(t;T)=5\sin(\frac{2\pi}{T}t)$ . We use three different teacher trajectories for pretraining, with $T=10,15,20$ . After pretraining, our networks can precisely dynamically learn oscillations with unseen periods within and slightly beyond the pretrained ones (Fig. 2a,b, see (Sup, ) for further detail and analysis of learning performance of all tasks). Next, in (ii), we generalize (i) to higher order Fourier series. Specifically, we consider the target family of superpositions of two random Fourier series with weighting factor $\lambda$ : $\tilde{z}(t;\lambda)=(1-\lambda)\tilde{z}_{1}(t;\lambda)+\lambda\tilde{z}_{2}(t;\lambda)$ . Here, $\tilde{z}_{l}(t;\lambda),\,l=1,2,$ are Fourier series of order $O$ and period $T(\lambda)=(1-\lambda)T_{1}+\lambda T_{2}$ . $T_{l}$ and the Fourier coefficients are drawn randomly. We use seven different teacher trajectories for pretraining, with weighting factors distributed equidistantly between 0 and 1. After pretraining, we test the dynamical learning for thirteen weighting factors also distributed equidistantly between 0 and 1. To quantify the learning performance, we determine the fraction of these targets that can be successfully learned (RMSE below given threshold (0.4) and below RMSE between signal and (other) pretrained targets). We find that networks of increasing size can learn Fourier series with increasing order (Fig. 2c,d). Networks with 3000 neurons learn Fourier series of order 10 with a median fraction of successes of close to $90\%$ . Hence, very general periodic functions can be learned. The highest producible frequency is limited by the available neuronal time scales $\tau_{i}$ . We thus expect that larger networks containing smaller $\tau_{i}$ can learn even higher order targets.

To check if our approach also works for a target family with more than one parameter and multidimensional trajectories, we consider in (iii) a superposition of sines with different amplitude and period (consequently $k,\tilde{c}$ are two-dimensional vectors) and in (iv) a set of fixed points along a curve in three-dimensional space. We find that, after pretraining, our networks are able to dynamically learn unseen members of these target families with multidimensional context or signal, as shown in Fig. 3a,b for example trajectories.

Second, we consider a family $\dot{\tilde{z}}(t)=F(\tilde{z}(t),u(t);k)$ of target dynamical systems. The networks are pretrained on a few representative systems. Thereafter, an unseen one is dynamically learned. Learning is in both phases based on imitation of trajectories. However, in contrast to tasks (i-iv) the networks now need to generate unseen output trajectories during testing. To demonstrate dynamical learning of a driven system, we consider task (v) of approximating the trajectory of an overdamped pendulum with drive $u(t)$ and different masses $m$ : $\dot{\tilde{z}}(t)=F(\tilde{z}(t),u(t);m)$ . During pretraining and dynamical learning, we use low-pass filtered white noise as drive (Fig. 3c, left of dashed vertical). During testing, we use a triangular wave (Fig. 3c, right of dashed vertical). As our networks nevertheless generate the correct qualitatively different signal (Fig. 3c,d), they must have learned the underlying vector field $F(\tilde{z},u;m)$ . (v) also shows that learning goes beyond interpolation of trajectories (compare blue and gray traces in Fig. 3d). Finally, in task (vi) we show dynamical learning of chaotic dynamics, considering autonomous Lorenz systems with different dissipation parameter $\beta$ of the $z$ -variable. For chaotic dynamics, even trajectories of similar systems quickly diverge. The aim in this task is thus only to generate during testing signals of the same type as the trajectories of the target system. We test this by comparing the limit sets of the dynamics and the tent-map relation between subsequent maxima of the $z$ -coordinate (Fig. 3e,f). The reproduction of the tent-map relation further shows that our approach can generate not explicitly trained quantitative dynamical features. We note that the networks also dynamically learn the fixed point convergence of some of the targets in the considered parameter space, even though they were pretrained on chaotic dynamics only (Sup, ).

*Analysis. *In the following we analyze the different parts of our network learning and its applicability. One interpretation of the pretraining phase is that the network learns a negative feedback loop, which reduces the error $\varepsilon(t)$ . For another interpretation, we split $\varepsilon(t)$ and regroup the $z$ -dependent part of Eq. (1) as $(w_{z}+w_{\varepsilon})z(t)-w_{\varepsilon}\tilde{z}(t)$ : feeding back $\varepsilon(t)$ is equivalent to adding a teacher drive $\tilde{z}(t)$ , except for a specific change in the feedback weights $w_{z}$ . For the $z$ -output alone the network thus weight-learns an autoencoder $\tilde{z}(t)\rightarrow z(t)$ . This is usually an easy task for reservoir networks (Abbott et al., 2016). To simultaneously learn the constant output $c(t)=\tilde{c}$ , the network has to choose an appropriate $o_{c}$ orthogonal to the subspaces in which the different $z(t)$ -driving $r$ -dynamics take place. Orthogonal directions are available in sufficiently large networks, since the subspaces are low-dimensional (Abbott et al., 2010).

After the correct $z$ -dynamics are assumed, we have $\varepsilon(t)\approx 0$ . Since remaining fluctuations in $\varepsilon(t)$ could stabilize the dynamics, we usually include ensuing learning phases with $w_{\varepsilon}\rightarrow 0$ and $c(t)=\tilde{c}$ . These teach the network to generate the correct dynamics in stable manner under conditions similar to testing.

To analyze the principles underlying dynamical learning and testing, we consider task (i). The similarity of the network and learning setups suggests that the same principles underlie all our tasks. We additionally confirm this for (vi) (Sup, ). Viewing the network dynamics in the space of firing rates $r$ , we choose new coordinates with first axis along $o_{c}$ and the principal components of the dynamics orthogonal to $o_{c}$ . The dynamics are then given by $c(t)=o_{c}r(t),r_{\text{PC}1}(t),r_{\text{PC}2}(t),...$ (Fig. 4). We focus on the first three coordinates, which describe large parts of the dynamics and output generation.

We find that during dynamical learning, the error feedback drives the dynamics towards an orbit that is shifted in $c$ but similar to pretrained ones. The network therewith generalizes the pretrained reaching and generation of orbits together with corresponding, near-constant $c(t)$ , while $\varepsilon(t)$ is fed in. We note that the combination of current state and error input is important (see Fig. 4a for $w_{\varepsilon}\rightarrow 0$ and a mismatched $\tilde{z}(t)=\tilde{z}(t_{0})\,\,\text{for}\,\,t>t_{0}$ ).

During testing, the network generalizes the pretrained characteristics that feeding back $w_{c}\tilde{c}$ leads to $c(t)\approx\tilde{c}$ . Clamping $w_{c}c(t)\,\,\text{to}\,\,w_{c}\bar{c}$ thus results in an approximate restriction of $r(t)$ to an $N-1$ -dimensional hyperplane with $c(t)=o_{c}r(t)\approx\bar{c}$ (Fig. 4b). The resulting trajectory is for task (i) a stable periodic orbit that generates the desired signal, because the vector field projected to the $c(t)=\bar{c}$ -hyperplane is similar to the vector field projected to the $c(t)=\tilde{c}$ -hyperplanes embedding nearby pretrained periodic orbits (Fig. 4c).

*Discussion and conclusion. *We have introduced a scheme how neural networks can quickly learn dynamics without changing their weights and without requiring a teacher during testing. It relies on a weight-learned mutual association, quasi an entanglement, between contexts and targets. This enables the latter to fix the former during dynamical learning and vice versa during testing.

Previous approaches to supervised dynamical learning with continuous signal space required a form of the teaching signal also during testing. They further differ in network architecture, learning algorithm, task and/or assumption of discrete time from ours (Sup, ; Feldkamp and Puskorius, 1994a, b; Feldkamp et al., 1996; Feldkamp and Puskorius, 1997; Feldkamp et al., 1997; Younger et al., 1999; Hochreiter et al., 2001; Younger et al., 2001; Feldkamp et al., 2003; Santiago, 2004; Lukosevicius, 2007; Oubbati and Palm, 2010; Santoro et al., 2016; Bellec et al., 2018; Oubbati et al., 2005; Jaeger and Eck, 2008; wyffels et al., 2014). In networks with external input unseen, interpolating input can lead to interpolating dynamics (Sup, ; Boström et al., 2013). In contrast, our networks learn new dynamics, by imitation.

Our scheme is conceptually independent of the network and weight-learning model. The pretraining implements a form of structure learning, i.e. learning of the structure underlying a task family (Lansdell and Kording, 2018; Braun et al., 2010). Animals and humans employ it frequently, but little is known about its neurobiology. We thus realize it by a simple reservoir computing scheme with FORCE learning (Sup, ; Sussillo and Abbott, 2009). We checked that we can use biologically more plausible weight perturbation learning for a simple fixed point learning task (Sup, ).

Dynamical learning is biologically plausible: it is naturally local, causal and does not require fast synaptic weight updates. Continuous supervision could be generated by an inverse model (Jordan and Rumelhart, 1992) and might be replaceable by a sparse, partial signal. Our dynamical learning is fast (Sup, ) (cf. also (Wang et al., 2018; Jaeger and Eck, 2008; Feldkamp et al., 1996, 1997; Hochreiter et al., 2001; Younger et al., 2001)). Even for more complicated tasks convergence requires only a few multiples of a characteristic time scale of the dynamics. Further, we find robustness against changes in network and task parameters (Sup, ). The above points suggest a high potential of our scheme for applications in biology, physics and engineering such as neuromorphic computing and the prediction of chaotic systems (Sup, ).

Acknowledgements.

We thank Paul Züge for fruitful discussions, the German Federal Ministry of Education and Research (BMBF) for support via the Bernstein Network (Bernstein Award 2014, 01GQ1710) and the European Union’s Horizon 2020 research and innovation programme for support via the Marie Sklodowska-Curie Grant No 754411.

Bibliography79

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Dayan and Abbott (2001) P. Dayan and L. F. Abbott, Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems (MIT Press, Cambridge, 2001).
2Gerstner et al. (2014) W. Gerstner, W. M. Kistler, R. Naud, and L. Paninski, Neuronal Dynamics - From Single Neurons to Networks and Models of Cognition (Cambridge University Press, Cambridge, 2014).
3Perich et al. (2018) M. G. Perich, J. A. Gallego, and L. E. Miller, “A neural population mechanism for rapid learning,” Neuron 100 , 964–976.e 7 (2018) . · doi ↗
4Thrun and Pratt (1998) S. Thrun and L. Pratt, eds., Learning to Learn (Springer US, 1998).
5Vanschoren (2018) Joaquin Vanschoren, “Meta-Learning: A Survey,” ar Xiv:1810.03548 (2018).
6Lansdell and Kording (2018) B. J. Lansdell and K. P. Kording, “Towards learning-to-learn,” ar Xiv:1811.00231 (2018).
7Duan et al. (2016) Y. Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever, and P. Abbeel, “RL 2 : Fast Reinforcement Learning via Slow Reinforcement Learning,” ar Xiv:1611.02779 (2016).
8Wang et al. (2018) J. X. Wang, Z. Kurth-Nelson, D. Kumaran, D. Tirumala, H. Soyer, J. Z. Leibo, D. Hassabis, and M. Botvinick, “Prefrontal cortex as a meta-reinforcement learning system,” Nat. Neurosci. 21 , 860–868 (2018) . · doi ↗