Training dynamically balanced excitatory-inhibitory networks

Alessandro Ingrosso; L.F. Abbott

arXiv:1812.11424·cond-mat.dis-nn·July 1, 2020

Training dynamically balanced excitatory-inhibitory networks

Alessandro Ingrosso, L.F. Abbott

PDF

TL;DR

This paper presents a method for constructing biologically plausible excitatory-inhibitory neural networks that are balanced and capable of complex temporal processing, using a target-based approach with online constrained optimization.

Contribution

It introduces a novel training approach combining target-based methods with online constrained optimization to build balanced neural networks obeying Dale's law.

Findings

01

Networks can produce complex temporal patterns.

02

Networks successfully solve input-output tasks.

03

Biological features like Dale's law are preserved.

Abstract

The construction of biologically plausible models of neural circuits is crucial for understanding the computational properties of the nervous system. Constructing functional networks composed of separate excitatory and inhibitory neurons obeying Dale's law presents a number of challenges. We show how a target-based approach, when combined with a fast online constrained optimization technique, is capable of building functional models of rate and spiking recurrent neural networks in which excitation and inhibition are balanced. Balanced networks can be trained to produce complicated temporal patterns and to solve input-output tasks while retaining biologically desirable features such as Dale's law and response variability.

Equations25

τ \frac{d x _{i}}{d t} = - x_{i} + j = 1 \sum N J_{ij} ϕ (x_{j}) + k = 1 \sum K_{\mbox in} w_{ik}^{\mbox in} F_{k}^{\mbox in} + I_{i}

τ \frac{d x _{i}}{d t} = - x_{i} + j = 1 \sum N J_{ij} ϕ (x_{j}) + k = 1 \sum K_{\mbox in} w_{ik}^{\mbox in} F_{k}^{\mbox in} + I_{i}

τ_{\mbox m} \frac{d V _{i}}{d t}

τ_{\mbox m} \frac{d V _{i}}{d t}

τ_{\mbox s} \frac{d s _{i}}{d t}

J^{{\mbox{\scriptsize eff}}}\left(\begin{array}[]{c}r_{{\mbox{\scriptsize\sc E}}}\\ r_{{\mbox{\scriptsize\sc I}}}\end{array}\right)\sim-\left(\begin{array}[]{c}\alpha_{{\mbox{\scriptsize\sc E}}}\\ \alpha_{{\mbox{\scriptsize\sc I}}}\end{array}\right)\quad\mbox{where}\quad J^{{\mbox{\scriptsize eff}}}=\sqrt{N}\left(\begin{array}[]{cc}\overline{J}_{{\mbox{\scriptsize\sc E}}{\mbox{\scriptsize\sc E}}}&\overline{J}_{{\mbox{\scriptsize\sc E}}{\mbox{\scriptsize\sc I}}}\\ \overline{J}_{{\mbox{\scriptsize\sc I}}{\mbox{\scriptsize\sc E}}}&\overline{J}_{{\mbox{\scriptsize\sc I}}{\mbox{\scriptsize\sc I}}}\end{array}\right)\,,

J^{{\mbox{\scriptsize eff}}}\left(\begin{array}[]{c}r_{{\mbox{\scriptsize\sc E}}}\\ r_{{\mbox{\scriptsize\sc I}}}\end{array}\right)\sim-\left(\begin{array}[]{c}\alpha_{{\mbox{\scriptsize\sc E}}}\\ \alpha_{{\mbox{\scriptsize\sc I}}}\end{array}\right)\quad\mbox{where}\quad J^{{\mbox{\scriptsize eff}}}=\sqrt{N}\left(\begin{array}[]{cc}\overline{J}_{{\mbox{\scriptsize\sc E}}{\mbox{\scriptsize\sc E}}}&\overline{J}_{{\mbox{\scriptsize\sc E}}{\mbox{\scriptsize\sc I}}}\\ \overline{J}_{{\mbox{\scriptsize\sc I}}{\mbox{\scriptsize\sc E}}}&\overline{J}_{{\mbox{\scriptsize\sc I}}{\mbox{\scriptsize\sc I}}}\end{array}\right)\,,

h_{i}^{\mbox \sc T} (t) = j = 1 \sum N J_{ij}^{\mbox \sc T} ϕ (x_{j}^{\mbox \sc T} (t)) + k = 1 \sum K_{\mbox o u t} w_{ik}^{\mbox \sc T} F_{k}^{\mbox o u t} (t),

h_{i}^{\mbox \sc T} (t) = j = 1 \sum N J_{ij}^{\mbox \sc T} ϕ (x_{j}^{\mbox \sc T} (t)) + k = 1 \sum K_{\mbox o u t} w_{ik}^{\mbox \sc T} F_{k}^{\mbox o u t} (t),

E_{i}=\frac{1}{t_{{\mbox{\scriptsize run}}}}\int_{0}^{t_{{\mbox{\scriptsize run}}}}dt\Big{(}h_{i}^{{\mbox{\scriptsize\sc T}}}\!\left(t\right)-\sum_{j=1}^{N}J_{ij}\phi\left(x_{j}(t)\right)\Big{)}^{2}+\alpha R_{i}\,.

E_{i}=\frac{1}{t_{{\mbox{\scriptsize run}}}}\int_{0}^{t_{{\mbox{\scriptsize run}}}}dt\Big{(}h_{i}^{{\mbox{\scriptsize\sc T}}}\!\left(t\right)-\sum_{j=1}^{N}J_{ij}\phi\left(x_{j}(t)\right)\Big{)}^{2}+\alpha R_{i}\,.

h_{\mbox \sc E} = ⟨ \frac{1}{N _{\mbox \sc E}} i \in \mbox \sc E \sum j \sum J_{ij} r_{j} ⟩ + I_{\mbox \sc E} = \tilde{h}_{\mbox \sc E} N_{\mbox \sc E} J_{\mbox \sc E \mbox \sc E} ⟨ r_{\mbox \sc E} ⟩ + N_{\mbox \sc I} J_{\mbox \sc E \mbox \sc I} ⟨ r_{I} ⟩ + I_{\mbox \sc E} + c_{\mbox \sc E} \frac{1}{N _{\mbox \sc E}} i \in \mbox \sc E \sum j \sum δ J_{ij} ⟨ r_{j} ⟩,

h_{\mbox \sc E} = ⟨ \frac{1}{N _{\mbox \sc E}} i \in \mbox \sc E \sum j \sum J_{ij} r_{j} ⟩ + I_{\mbox \sc E} = \tilde{h}_{\mbox \sc E} N_{\mbox \sc E} J_{\mbox \sc E \mbox \sc E} ⟨ r_{\mbox \sc E} ⟩ + N_{\mbox \sc I} J_{\mbox \sc E \mbox \sc I} ⟨ r_{I} ⟩ + I_{\mbox \sc E} + c_{\mbox \sc E} \frac{1}{N _{\mbox \sc E}} i \in \mbox \sc E \sum j \sum δ J_{ij} ⟨ r_{j} ⟩,

Δ_{x} = \frac{\int d t \sum _{i} ( x _{i} ( t ) - x ~ _{i} ( t ) ) ^{2}}{\int d t \sum _{i} ( x _{i} ( t ) ) ^{2}},

Δ_{x} = \frac{\int d t \sum _{i} ( x _{i} ( t ) - x ~ _{i} ( t ) ) ^{2}}{\int d t \sum _{i} ( x _{i} ( t ) ) ^{2}},

J_{ij} \to \frac{J _{ij} C _{ii} + α t W _{ij} + D _{ij}}{C _{ii} + α t}

J_{ij} \to \frac{J _{ij} C _{ii} + α t W _{ij} + D _{ij}}{C _{ii} + α t}

D_{i :} \to D_{i :} - Δ J_{ij} C_{j :}

D_{i :} \to D_{i :} - Δ J_{ij} C_{j :}

\mbox \sc X \in {E, I}, j \in \mbox \sc X \sum (J_{ij} - \frac{\sum _{k \in \mbox \sc X} J _{ik}}{N _{\mbox \sc X}})^{2},

\mbox \sc X \in {E, I}, j \in \mbox \sc X \sum (J_{ij} - \frac{\sum _{k \in \mbox \sc X} J _{ik}}{N _{\mbox \sc X}})^{2},

E_{\mbox t es t} = \frac{\sum _{k = 1}^{K_{\mbox o u t}} \sum _{t = 0}^{T_{\mbox t es t}} ( z _{k} ( t ) - F _{k}^{\mbox o u t} ( t ) ) ^{2}}{\sum _{k = 1}^{K_{\mbox o u t}} \sum _{t = 0}^{T_{\mbox t es t}} ( F _{k}^{\mbox o u t} ( t ) ) ^{2}} .

E_{\mbox t es t} = \frac{\sum _{k = 1}^{K_{\mbox o u t}} \sum _{t = 0}^{T_{\mbox t es t}} ( z _{k} ( t ) - F _{k}^{\mbox o u t} ( t ) ) ^{2}}{\sum _{k = 1}^{K_{\mbox o u t}} \sum _{t = 0}^{T_{\mbox t es t}} ( F _{k}^{\mbox o u t} ( t ) ) ^{2}} .

\frac{\sum _{t} z ( t ) F ^{\mbox o u t} ( t )}{\sum _{t} z ^{2} ( t ) \sum _{t} ( F ^{\mbox o u t} ( t ) ) ^{2}} > 0.5 .

\frac{\sum _{t} z ( t ) F ^{\mbox o u t} ( t )}{\sum _{t} z ^{2} ( t ) \sum _{t} ( F ^{\mbox o u t} ( t ) ) ^{2}} > 0.5 .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Training dynamically balanced excitatory-inhibitory networks

Alessandro Ingrosso

L.F. Abbott

Zuckerman Mind, Brain, Behavior Institute, Columbia University, New York, NY 10027

Abstract

The construction of biologically plausible models of neural circuits is crucial for understanding the computational properties of the nervous system. Constructing functional networks composed of separate excitatory and inhibitory neurons obeying Dale’s law presents a number of challenges. We show how a target-based approach, when combined with a fast online constrained optimization technique, is capable of building functional models of rate and spiking recurrent neural networks in which excitation and inhibition are balanced. Balanced networks can be trained to produce complicated temporal patterns and to solve input-output tasks while retaining biologically desirable features such as Dale’s law and response variability.

Introduction

Cortical neurons typically require only a small fraction of their thousands of excitatory inputs to reach firing threshold. This suggests an overabundance of excitation that must be balanced by inhibition to keep neurons within their functional operating ranges. An interesting suggestion is that this balance does not require fine-tuning of synaptic strengths, what we will call parametric balance, but rather occurs dynamically VanVreeswijkChaosScience ; VanVreeswijkChaoticBalancedStateNeuralComputation ; RenartAsynchronous ; KadmonSompolinsky ; HarishHanselAsynchronous ; mastrogiuseppeeinetworks ; TsodyksStateSwitching ; BrunelDynamicsSparsely . Dynamically balanced neural network models were originally introduced to account for the high variability of neural activity. Variants of balanced networks have since been used to model response selectivity HanselMechanismOrientation ; PehlevanSelectivity and associative memory RubinBalanced , but a general approach to task learning in these models has not previously been developed. The challenge is that the mean activity of dynamically balanced network models is constrained. To maintain dynamic balance, a learning scheme must flexibly alter network dynamics while respecting this constraint, otherwise the network will transition to a parametrically balanced regime. In addition, balanced networks display a reduced mean response to stimuli and, as a consequence, suppressing chaos in these networks, a critical step in learning, requires some care. We present approaches for training networks while both suppressing chaos and retaining dynamic balance.

In addition to the issues with balancing outlined above, training networks with sign-constrained weights presents some technical challenges. Batch approaches to learning can handle sign constraints quite efficiently, but batch training of recurrent networks often leads to instabilities during testing, even when the training error is small sussillo2009generating ; JaegerTutorial . The use of an online strategy is critical to quench spontaneous chaotic fluctuations during training and to assure stability of the trained dynamics. These requirements demand fast learning algorithms capable of adjusting weights as the network is running. In previous work sussillo2009generating ; LajeBuonomanoTamingChaos ; Vincent-LamarreDrivingReservoir , this was achieved by using a recursive least squares (RLS) algorithm that has the favorable feature of constraining network dynamics while permitting fluctuations during training that are critical for post-training stability. Unfortunately, when sign-constraints are imposed, standard online training procedures, including RLS, are no longer viable. Here, we developed a fast sign-constrained online method that proves effective at training both rate and spiking balanced network models.

Results

Dynamically and Parametrically Balanced Networks

The networks we consider are composed of either spiking neurons interacting via synaptic currents or so-called rate units. A task is generally specified by a set of desired output signals $F_{k}^{{\mbox{\scriptsize out}}}\left(t\right)$ , for $k=1,2,\ldots K_{{\mbox{\scriptsize out}}}$ that are read out through channels $z_{k}$ . These signals can either be autonomously generated by the network or arise in response to $K_{{\mbox{\scriptsize in}}}$ external inputs $F_{k}^{{\mbox{\scriptsize in}}}$$\left(t\right)$ entering the network through input weight vectors $\boldsymbol{w}_{k}^{{\mbox{\scriptsize in}}}$ . The input weights are generally chosen randomly and not subject to learning, whereas the readout weights, which are not sign-constrained, are trained using RLS. In rate models, $z_{k}=\boldsymbol{w}_{k}^{{\mbox{\scriptsize out}}}\cdot\phi\left(\boldsymbol{x}\right)$ , where $\phi(x)$ is the rate activity for a unit with total input $x$ . The equations of the $N$ units of the network, for $i=1,2,\ldots,N$ , are

[TABLE]

where $I\in\left\{I_{{\mbox{\scriptsize\sc E}}},I_{{\mbox{\scriptsize\sc I}}}\right\}$ is a vector of constant and uniform external currents into the E and I populations, and $w_{:k}^{{\mbox{\scriptsize in}}}$ are the weight vectors for each of the $K_{{\mbox{\scriptsize in}}}$ input channels. We employ a variety of activation functions, e.g. halftanh ( $\phi\left(x\right)=\theta(x)\tanh\left(x\right)$ ), sigmoid ( $\phi\left(x\right)=1/\left(1+\exp\left(-x\right)\right)$ ) or ReLU ( $\phi\left(x\right)=\theta\left(x\right)x$ ), where $\theta$ is the Heaviside step function ( $\theta\left(x\right)=1$ when $x>0$ and [math] otherwise).

For the spiking networks, we use leaky integrate-and-fire (LIF) dynamics (although good performances can be achieved with other neuronal models) of the form

[TABLE]

where $\tau_{{\mbox{\scriptsize m}}}$ is the membrane time constant ( $\tau_{{\mbox{\scriptsize m}}}=20$ ms in all simulations) and $t_{i}^{{\mbox{\scriptsize f}}}$ is a list of the times when neuron $i$ fired. When $V_{i}\left(t\right)$ reaches the spiking threshold $V_{{\mbox{\scriptsize th}}}$ (usually set to $1$ ) a spike is emitted and the voltage $V_{i}$ is reset to $V_{{\mbox{\scriptsize res}}}$ and kept constant for a period of time equal to the refractory period $\tau_{{\mbox{\scriptsize ref}}}$ . We typically take either $\tau_{{\mbox{\scriptsize ref}}}=2$ ms or no refractoriness ( $\tau_{{\mbox{\scriptsize ref}}}=0$ ), and $\tau_{{\mbox{\scriptsize s}}}=50$ ms or $\tau_{{\mbox{\scriptsize s}}}=100$ ms. The readouts for spiking networks are given by $z_{k}=\boldsymbol{w}_{k}^{{\mbox{\scriptsize out}}}\cdot\boldsymbol{s}$ .

For networks with distinct excitatory and inhibitory neurons, the connection matrix $J$ in equations (1) and (2) is divided into 4 blocks, $J_{{\mbox{\scriptsize\sc E}}{\mbox{\scriptsize\sc E}}}$ , $J_{{\mbox{\scriptsize\sc E}}{\mbox{\scriptsize\sc I}}}$ , $J_{{\mbox{\scriptsize\sc I}}{\mbox{\scriptsize\sc E}}}$ and $J_{{\mbox{\scriptsize\sc I}}{\mbox{\scriptsize\sc I}}}$ , where the first and second subscripts denote the type of post- and presynaptic neurons, respectively. We are interested in training networks that are in a so-called balanced regime for which the strengths of individual synapses are of order $1/\sqrt{N_{{\mbox{\scriptsize pre}}}}$ , where $N_{{\mbox{\scriptsize pre}}}$ is the number or pre-synaptic neurons, which in our case is $\mathcal{O}\left(N\right)$ . Because of this and because we focus on cases with equal numbers of E and I neurons (although other choices yield similar results), we will not distinguish between orders of $\sqrt{N_{{\mbox{\scriptsize pre}}}}$ and $\sqrt{N}$ and stick to the latter notation. Sign-constrained weights combined with non-negative firing rates implies that the sum of the inputs over all excitatory or all the inhibitory synapses is of order $N/\sqrt{N}=\sqrt{N}$ . If uncanceled, this would introduce an extremely large term into equations (1) and (2) that would lead to either nearly silenced or saturated dynamics. The solution to this problem is to introduce a compensating large constant input, $I_{i}=[\alpha_{{\mbox{\scriptsize\sc E}}}\sqrt{N},\alpha_{{\mbox{\scriptsize\sc I}}}\sqrt{N}]$ that is also of order $\sqrt{N}$ ( $\alpha$ is of order $1$ ). Cancellation of all the terms in equations (1) and (2) of order $\sqrt{N}$ then leads to the constraint

[TABLE]

and the symbol $\sim$ implies equality to within a discrepancy of order $1/\sqrt{N}$ . In this equation, $\overline{J}_{{\mbox{\scriptsize\sc X}}{\mbox{\scriptsize\sc Y}}}$ is the average of the elements in the submatrix $J_{{\mbox{\scriptsize\sc X}}{\mbox{\scriptsize\sc Y}}}$ , and $r_{{\mbox{\scriptsize\sc E}}}$ and $r_{{\mbox{\scriptsize\sc I}}}$ are the average firing rates of the excitatory and inhibitory populations. In a dynamically balanced network, the means of the excitatory and inhibitory populations are both constrained by equation (4). As we will show, there are different regimes for trained parametrically balanced networks. When $I=0$ or of order 1, equation (4) forces extremely small rates unless the determinant of $J^{{\mbox{\scriptsize eff}}}$ is of order $1/\sqrt{N}$ , resulting in a parametrically balanced situation. If $I$ is of order $\sqrt{N}$ , we find a different kind of parametric balance that involves another form of fine-tuned cancelation, as discussed below.

In our experience, many learning schemes result in connection matrices that realize a parametric rather than dynamic balance RubinBalanced . This comes about even if the initial connectivity $J$ has a $J^{{\mbox{\scriptsize eff}}}$ with determinant of order 1. One common way for this to occur is if learning imposes a symmetry on the $J$ matrix so that the excitatory and inhibitory mean weight values are proportional to each other. We now show that an online learning scheme, combined with the appropriate regularization, can construct dynamically balanced models that solve a variety of tasks.

Full-Force in E/I Networks

We build upon a previously developed target-based approach for training rate and spiking networks abbott2016building ; depasquale2016using ; DePasqualeFullForce (Fig. 1). In this scheme, a teacher network (T), which in the cases we consider is an E/I rate model, is driven by the desired output signals $F_{k}^{{\mbox{\scriptsize out}}}$$\left(t\right)$ . This is done by adding a term $\sum_{k=1}^{K_{{\mbox{\scriptsize out}}}}w_{ik}^{T}F_{k}^{{\mbox{\scriptsize out}}}$ to equation (1) with random weights $w_{ik}^{{\mbox{\scriptsize\sc T}}}$ (we use superscript T to denote quantities associated with the teacher network). We then extract a set of target currents,

[TABLE]

from the teacher network. The full recurrent synaptic matrix $J$ of the network we are training (called the student network; variables without superscripts T are associated with the student network) is then trained to generate these target currents autonomously without any driving input. Specifically, for each neuron the training goal is to minimize the cost function, for a run of duration $t_{{\mbox{\scriptsize run}}}$ , $E=\sum_{i}E_{i}$ with

[TABLE]

$R_{i}$ is a regularization term to be discussed below. In our case, the expression in equation (6) is minimized subject to sign constraints on the elements of the matrix $J$ .

In the original full-FORCE scheme abbott2016building ; depasquale2016using , the cost (6) is minimized using RLS but, as discussed above, this is not a viable procedure when sign constraints are imposed. Instead, we use bounded constrained coordinate descent (BCD) wright2015coordinate , which proves to be a fast and reliable strategy for training both rate and spiking models with sign constrained weights (Methods). The resulting learning algorithm is fast enough to effectively clamp the network dynamics close to the desired trajectory during training, suppressing chaos and assuring stability.

Training Dynamically Balanced Networks

For a given task, the distribution of synaptic weights after training depends on a variety of factors including the initial value of the $J$ matrix, which we call $J^{0}$ , the choice of regularizer, and whether the network is tonically driven by large constant external current ( $I$ in equation (1) and (2)). We begin by considering a task in which the network must autonomously (meaning with time-independent input) generate the periodic output shown in Fig. 2a. When no constant external current is present ( $I=0$ ), equation (4) requires a parametric balance for any appreciable activity to exist in the network. The resulting parametrically balanced network can perform the task, but we find that an extensive fraction of synaptic weights are set to zero by the training algorithm, so that the resulting networks display a connection probability $\sim 0.5$ and a symmetric weight distribution (Fig. 2b,i). In the presence of constant external currents of order $\sqrt{N}$ , the network has the potential to be dynamically balanced, but we find that, with a commonly used L2 weight regularization ( $R_{i}=\sum_{j}J_{ij}^{2}$ ), the network also goes into a parametrically balanced configuration, though of a different form. This occurs regardless of the structure of the teacher network or the value of $\det{J^{{\mbox{\scriptsize eff}}}}$ for the initial weights $J^{0}$ . In this case, the weight distribution typically shows an extensive number of zero weights and a distribution of excitatory synapses that is approximately Gaussian but cut-off at zero (Fig. 2b,ii). The determinant of $J^{{\mbox{\scriptsize eff}}}$ is small but, unlike the case with zero external current, it is not of order $1/\sqrt{N}$ (Fig. 2c).

To understand the nature of the parametric balance exhibited by networks with $I$ or order $\sqrt{N}$ trained with an L2 regularizer, we averaged the total input to the excitatory neurons and divided the result into two pieces,

[TABLE]

where we have introduced $\delta J$ as the connection matrix with the block mean values removed, and brackets denote time averages. In a standard dynamically balanced case, both $\tilde{h}_{{\mbox{\scriptsize\sc E}}}$ and $c_{{\mbox{\scriptsize\sc E}}}$ are of order 1, as is $h_{{\mbox{\scriptsize\sc E}}}$ . In contrast, for the network trained with the L2 regularizer, both $\tilde{h}_{{\mbox{\scriptsize\sc E}}}$ and $c_{{\mbox{\scriptsize\sc E}}}$ are of order $\sqrt{N}$ , but they cancel to produce $h_{{\mbox{\scriptsize\sc E}}}$ of order 1 (Fig. 2d). This cancellation is due to a fine-tuning of $J$ that arises during the learning process.

These results illustrate that dynamically balanced networks do not arise naturally from learning, even if the teacher network and the initial weight matrix of the student network are configured to be dynamically balanced and $I$ is of order $\sqrt{N}$ . The learning algorithm with L2 regularization tends to push the weight matrix to a parametrically balanced regime. We found a simple way to prevent this: choose $J^{0}$ to satisfy the dynamically balanced condition (stable solution to equation (4) with order 1 rates) and use regularization to keep $J$ from straying too far from $J^{0}$ . The regularization that does this still uses an L2 norm, but on the difference between $J$ and $J^{0}$ rather than on the magnitude of $J$ . Specifically, we define what we call the J0 regularizer by $R_{i}=\sum_{j}(J_{ij}-J_{ij}^{0})^{2}$ . With this regularizer, the weights after training display a Gaussian-like distribution (Fig. 2b,iii), block-wise average weights scaling as $1/\sqrt{N}$ and a $J^{{\mbox{\scriptsize eff}}}$ determinant of order 1 (Fig. 2e). Furthermore, the total current $h_{{\mbox{\scriptsize\sc E}}}$ and the two components we have introduced, $\tilde{h}_{{\mbox{\scriptsize\sc E}}}$ and $c_{{\mbox{\scriptsize\sc E}}}$ , are all of order 1 (Fig. 2f). Thus, dynamically balanced networks trained in this way, even when they are fairly small, have average activities and currents in agreement with what is expected from a dynamically balanced regime (however, deviations from the usual balance constraints are slightly larger than the deviations seen for randomly connected networks due to weight-rate correlations generated by learning).

We can use BCD and J0 regularization to train dynamically balanced spiking networks as well (Fig. 3). One common consequence of employing long synaptic time-scales is that a bursty spiking behavior emerges. The level of burstiness in trained networks can be varied by means of the $\omega_{h}$ parameter, that scales the intensity of the learned currents, generated by the slow synapses, with respect to the contribution provided by the random synapses with a fast time-constant (Methods). The irregularity of spiking in trained networks depends on the amplitude of the current fluctuations. To generate irregular spiking (Fig. 3b-d), we included random untrained fast-synapses (with synaptic time constant 2 ms; see depasquale2016using ) and an average excess of inhibition. The level of spiking irregularity can be quantified by computing the distribution of coefficient of variations (CV) of interspike intervals across the neurons of the network (3d). The average CV $\thickapprox 1$ .

Perturbations in trained balanced networks

Balanced networks trained on autonomous oscillation tasks can suppress homogeneous perturbations in a way similar to the decorrelation effect mediated by the strong inhibitory feedback in such networks RenartAsynchronous ; HeliasDecorrelation . As an example, we consider spiking networks trained to reproduce autonomously the periodic signal shown in Fig. 2a. We constructed both dynamically and parametrically balanced examples of these networks and perturbed them at random times with 10 ms duration current pulses. These pulses come in two types, either identical for all neurons, or identical in magnitude but opposite in sign for excitatory and inhibitory neurons, with positive input to the excitatory neurons. We call these E+I and E-I perturbations, respectively. Balanced networks generally exhibit a strong resilience to E+I perturbations (Fig. 4a, top) compared to external pulses in the E-I direction (Fig. 4a, bottom). The latter produce a longer lasting transient and a subsequent larger phase shift in the network output. This response to temporary imbalance in the collective activity of the E and I populations is reminiscent of balance-amplified transients, previously described by a linear theory MurphyMillerBalancedAmplification .

The role of inhibitory feedback is also apparent when a rate network is trained to produce the same rhythmic behavior. In this case, we perturbed the network with ongoing noise rather than with a transient. Homogeneous E+I input disturbances are cancelled by strong inhibitory recurrence in dynamically (Fig. 4b, top) but not in parametrically (Fig. 4b, bottom) balanced networks. E-I perturbations produce the strongest effect, and random heterogeneous perturbations produce similar effects in both networks, which are intermediate between E+I and E-I perturbations in the dynamically balanced case. E-I perturbations are somewhat amplified for the parametrically balanced case (Fig. 4b, bottom). For these studies, we examined the effect not merely on the output, as in Fig. 4a, but rather on the full network activity, defining

[TABLE]

where $x(t)$ is the noiseless activity of the rate network and $\tilde{x}(t)$ the perturbed activity. We expect similar results to hold for spiking networks HarishHanselAsynchronous .

Autonomous activity in trained networks

We found that the generation of oscillatory activity in trained network (such as that shown in Fig. 5a) could be described by a simple mechanism, at least when a single frequency dominates that output pattern. After training, the spectrum of the synaptic matrix $J$ usually shows a complex conjugate pair of eigenvalues with largest real part. This is not limited to target-based learning methods: we trained networks of different sizes using a variety of activation functions using back propagation through time (either employing stochastic gradient descent or ADAM KingmaAdam ), and we consistently observed this phenomenon for different target readout signals of various frequencies. For differentiable activation functions, the oscillatory frequency is approximately predicted to be $f=\mbox{Im}(\lambda_{1})/2\pi\tau\mbox{Re}(\lambda_{1})$ , where $\lambda_{1}$ is one of the two complex eigenvalues with largest real part of the matrix $J\phi^{\prime}|_{x0}$ (Fig. 5b), and $\phi^{\prime}|_{x0}$ is the derivative of the activation function computed at the (not necessarily zero) fixed point from which the oscillations arise by means of a supercritical Hopf transition.

This analysis can be verified after training is completed by artificially lowering the effective gain of the obtained connectivity matrix $J$ using a fictitious gain parameter $g_{{\mbox{\scriptsize test}}}$ in the testing phase, such that $J_{{\mbox{\scriptsize test}}}=g_{{\mbox{\scriptsize test}}}J$ . Nonlinear oscillations arise at the critical value $g_{{\mbox{\scriptsize test}}}^{*}$ where the previously stable fixed point loses its stability as the two dominant conjugate eigenvalues cross the imaginary axis (Fig. 5c). At the bifurcation, the frequency is controlled by the imaginary part of the dominant eigenvalues and the network dynamics is essentially two-dimensional. As $g_{{\mbox{\scriptsize test}}}$ is increased, there is a small change of frequency of the readout signal as nonlinear effects start to grow and other frequencies and harmonics kick in (Fig. 5b). This picture is consistent with previous work in random E/I separated rate models delmolinosynchronization as well as a recent study of low-rank perturbations to randomly connectivity matrices mastrogiuseppelinking .

Balanced networks can also be trained to produce prescribed chaotic dynamics (like the Lorenz attractor in Fig. 6a) or multiple complex quasi-periodic trajectories. In another task, inspired by the work of Laje, and Buonomano LajeBuonomanoTamingChaos in rate networks, and similar to recent extensions to the QIF spiking case in kimlearning , we trained a spiking network to reproduce a desired transient dynamics in response to an external stimulus. To do so, we recorded innate current trajectories $h_{i}^{{\mbox{\scriptsize\sc T}}}\left(t\right)$ generated by a randomly initialized LIF balanced network for a short period of time ( $2$ sec) during its spontaneous activity. We then trained the same network to reproduce its innate current trajectories whenever a strong external input was applied (dark blue line in Fig. 6b). The brief external pulse ( $50$ ms) is able to elicit the target trajectory, after which the network naturally resumes its irregular activity. Finally, the example in Fig. 6c shows an E/I spiking network instructed to generate the quasi-periodic dynamics of human walking behavior shortly after a $50$ ms unitary pulse. We trained $56$ linear decoders on the network activity to reproduce the time-course of each joint-angle from a human Motion-Capture dataset, as in sussillo2009generating ; alirezaarxiv . The average firing rate of the network is $20$ Hz. A brief input pulse can trigger the motion generation from asynchronous spontaneous activity or reset the phase of a previously stable quasi-periodic dynamics.

Input-Output tasks

Our learning procedure can also be employed to train dynamically balanced E/I networks capable of performing complex temporal categorization tasks. As our first example, a spiking network implements an exclusive OR function depasquale2016using anytime an appropriate sequence of inputs is presented, despite disturbance induced by its spontaneous asynchronous activity (Fig. 7a). In each trial, the network is presented with two pulses of durations that are chosen randomly to be either short ( $100$ ms) or long ( $300$ ms), coding for the truth values [math] (False) or $1$ (True). The network computes the XOR function of the two inputs and responds with an appropriate positive or negative readout signal (duration: $500$ ms) after a delay period ( $300$ ms). We used online BCD to train a balanced network of $N=1000$ LIF neurons and measured the number of correct responses. The trained network responds promptly when the two impulses are presented at any random time over the course of its spontaneous activity and reaches a test accuracy of $96\%$ .

As a second example, we construct an E/I spiking network to solve a more complex interval time-matching task, inspired by the “ready-set-go” task employed in Jazayeri2010 . This task has been solved previously using newtorks with unconstrained synaptic weights DePasqualeFullForce . In this task, the network receives two brief input pulses separated by a random delay $\Delta T$ , and it is trained to generate a response after exactly the same delay, following the second pulse. As in the temporal XOR task described above, it is crucial here that the network retains information about the first pulse during the whole delay period in the absence of any external input. Expecially for long delays $\Delta T$ , this task proves hard to solve. We therefore employ the heuristic technique of “hints” previously introduced in DePasqualeFullForce : in each training epoch, the teacher network is provided with both a ramping up and decreasing input (dashed yellow line in Fig. 7b, left) during the two relevant delay periods. An E/I network of $N=1000$ spiking neurons produces accurate responses to random delays between $400$ ms and $2$ s (Fig. 7b, right).

DISCUSSION

We have introduced a fast alternative to RLS that is capable of training sign-constrained rate-based and spiking network models and, in addition, has the promising features of good memory and computational requirements when dealing with E/I (and also sparse) models. We have shown that this fast target-based learning scheme can be used to train balanced networks of rate and spiking neurons for a wide variety of tasks. We described the conditions under which dynamically balanced networks can be obtained with the training procedure. We found that, in the absence of proper initialization and regularization, learning dynamics is attracted to regions of weight space with parametrically tuned connectivity, and we showed the impact of specific weight regularizations on the weight structure of trained networks, as well as their resilience to various external perturbations.

Relation to other work

We have tackled the problem of training spiking neural networks to display prescribed stable dynamics or to solve cognitively relevant input-output tasks. A number of top-down approaches have been proposed to train functional models of spiking networks, e.g. the neural engineering framework EliasmithNeuralEngineering , spike-coding BoerlinDeneveSpikeBasedPopulationCoding and nonlinear optimal control DeneveBrainEfficient ; alirezaarxiv . These methods are elegantly formulated and effective. Interestingly, they solve a different task than what our procedure solves. These methods train the network to reproduce a prescribed dynamics, whereas our method trains a network to produce a particular trajectory generated by those dynamics. The resulting two networks look identical as long as the prescribed trajectory is being followed, but they generalize differently if the network deviates from this trajectory.

Some variations of RLS-based training have been introduced previously to construct functional models of E/I separated spiking networks. In NicolaClopathSupervised , the authors employed a clipping procedure on top of a FORCE training method, which entails rank-1 updates to the original randomly connected recurrent network, while in depasquale2016using the authors used an off-line two step Full-FORCE procedure to train a large network performing an oscillation task. In a slightly different setting, the authors of kimlearning used Full-FORCE to train networks of quadratic integrate and fire neurons to reproduce prescribed synaptic drive, as well as spiking rate patterns in response to a brief strong stimulus. They provide an example of an E/I network with parametrically tuned effective connectivity and no external currents that tracks its own innate trajectories, recorded over the course of spontaneous activity. Sign constraints were imposed by eliminating updates of synapses that would pass out of the allowed ranges in a given epoch, and those synapses were then deleted in subsequent epochs (we call this strategy Clipped-RLS). Although performance of Clipped-RLS is comparable to BCD, this strategy proves memory-demanding for large network sizes, especially when dealing with dense topologies. Clipped-RLS entails using $N$ independent covariance matrices $P_{i}$ , one for each unit in the trained network, thus amounting to storing $N\times\left(pN\right)^{2}$ floating-point numbers (FPs). For comparison, BCD requires $2N^{2}$ .

Conclusions

Credit-assignment is a major problem in training spiking networks, where differentiability issues limit the use of gradient-based optimization (but see LeeBP ; HuhBP ; ZenkeBP ), which has proven very powerful in deep feed-forward architectures. Whereas in some approaches the credit assignment problem is tackled by relying on coding assumptions variably linked to optimality criteria, target-based approaches, both in the context of feed-forward LeeTargetPropagation and recurrent models, provide a straightforward solution. As shown above as well as in a recent work kimlearning , it is not essential for the teacher network to be a rate model, as long as it effectively acts as a dynamic reservoir that expands task dimensionality via its recurrency, therefore proving rich targets.

I Methods

I.1 Rate and spiking networks models

The weight matrix $J$ is initialized by setting $J_{ij}=J_{{\mbox{\scriptsize\sc X}}{\mbox{\scriptsize\sc Y}}}^{{\mbox{\scriptsize eff}}}/\sqrt{N_{{\mbox{\scriptsize pre}}}}+\Delta_{ij}$ , where X and Y are the appropriate E and I labels corresponding to neurons $i$ and $j$ . $\Delta_{ij}$ is a random matrix with entries that are zero-mean Gaussian distributed with each column $j$ having variance $g^{2}/N_{{\mbox{\scriptsize pre}}}$ . When a balanced teacher network is employed during training, we use a non-negative activation function and appropriately choose block averages and external constant currents $I\propto\sqrt{N}$ for which the balance equation yields a solution with appreciable positive rates. In those cases where we seek to train spiking networks displaying irregular spontaneous activity with low rates, we further adjusted the random part $\Delta_{ij}$ so that $\sum_{j}\Delta_{ij}=0$ for each row $i$ . By reducing quenched fluctuations in time-averaged activities for each neuron, this method ensures that spiking neurons trained on the teacher currents do not have abnormally low or large average activity.

Integration of ODEs is performed by the forward Euler method using an integration time-step not larger than $\Delta t=\tau/20$ for rate models and $\Delta t=0.5$ ms for spiking networks. We further scale down the integration time-step in all those case where large $J_{{\mbox{\scriptsize\sc X}}{\mbox{\scriptsize\sc Y}}}^{{\mbox{\scriptsize eff}}}$ and strong external currents are employed.

I.2 Learning algorithm

Bounded Coordinate Descent

When training a rate or a spiking network, we seek to match the incoming currents in the driven teacher $h_{i}^{{\mbox{\scriptsize\sc T}}}\left(t\right)=\sum_{j}J_{ij}^{{\mbox{\scriptsize\sc T}}}\phi\left(x_{j}^{{\mbox{\scriptsize\sc T}}}\left(t\right)\right)+\sum_{k}w_{ik}^{{\mbox{\scriptsize out}}}F_{k}^{{\mbox{\scriptsize out}}}\left(t\right)+I_{i}$ with those in the student: $h_{i}\left(t\right)=\sum_{j}J_{ij}\phi\left(x_{j}\left(t\right)\right)+I_{i}$ (for a rate student) or $h_{i}\left(t\right)=\sum_{j}J_{ij}s_{j}\left(t\right)+I_{i}$ (for a spiking student). In training spiking networks, performance is virtually unchanged if one were to choose to match the activity $x^{T}\left(t\right)$ in the teacher rate network with the synaptic currents $h\left(t\right)=Js\left(t\right)+I$ in the spiking network. We sometimes allow for an additional scaling and/or offset of the currents provided by the teacher network, so that the actual target currents are defined as $\omega_{h}h_{i}^{T}\left(t\right)+b_{h}$ . Each neuron is trained independently and in parallel every $\Delta t_{l}$ , after a transient $T_{d}$ to wash out the initial condition.

We optimize the loss-function with an online strategy by means of Bounded Coordinate Descent (BCD). In our case, the method consists in updating, in parallel for each postsynaptic neuron $i$ , each synapse $J_{ij}$ one at a time by computing the optimal solution to the one-dimensional optimization problem where all other synapses $J_{ik}$ for $k\neq j$ are kept fixed:

[TABLE]

where $C$ is the covariance matrix of the activities $C_{ij}\left(t\right)=\sum_{\tau=0}^{t}s_{i}\left(\tau\right)s_{j}\left(\tau\right)$ , which gets updated at each time-step by $C_{ij}\to C_{ij}+s_{i}s{}_{j}$ (these equations are for the spiking case; for rate models $s_{i}$ is replaced by $\phi(x_{i})$ ). The residual matrix $D_{ij}$ is defined as $D_{ij}\left(t\right)=\sum_{\tau=0}^{t}s_{j}\left(\tau\right)\left(h_{i}^{{\mbox{\scriptsize\sc T}}}\left(\tau\right)-h_{i}\left(\tau\right)\right)$ . After each update with change $\Delta J_{ij}$ , the $i$ th row $D_{i:}$ of the residual matrix $D$ gets updated according to

[TABLE]

where $C_{j:}$ stands for the $j$ th row of $C$ . Setting $W_{ij}=J_{ij}^{0}$ , where $J^{0}$ is the initial weight matrix, we implement the J0 regularizer. Alternatively, $W_{ij}=0$ corresponds to a simple L $2$ weight regularization.

The updating schedule of weight indexes $j\in\left\{1,2,...N\right\}$ can be either fixed or random at every step. For easier tasks, updating a random subset of incoming synapses at each time-step is enough to obtain good training performance. We do not update the weights when this would violate the imposed sign constraints.

One of the benefits of BCD, compared to local optimization approaches (e.g. stochastic gradient decent), is its ability to keep the neural trajectory close to the target during training, even more so in the presence of strong external currents that prevent the network from shutting-down.

We note that coordinate descent proves a versatile method even beyond the sign-constraind case. For example, in updating incoming synapses to neuron $i$ , it is easy to account for specific network topologies of the $J$ matrix by selecting a relevant subset of rows/colums of the (symmetric) matrix $C$ in the update equation (9).

Regularization

In addition to the regularizations discussed in the text, we also experimented with a regularization of the form

[TABLE]

which controls the variance of the outgoing synaptic weights in each sub-population. For simple tasks, this typically produces inhibitory dominated networks with a non-singular $J^{{\mbox{\scriptsize eff}}}$ .

Feedback stabilization

In some cases, it is useful to use a feedback mechanism to speed-up training and drastically reduce the frequency of weight update $1/\Delta t_{l}$ . Specifically, during training we drive the student network with a modified current $\tilde{h}_{i}=h_{i}+\kappa\left(t\right)\left(h_{i}^{{\mbox{\scriptsize\sc T}}}-h_{i}\right)$ . We use $\kappa\left(t\right)=\left|h-h^{{\mbox{\scriptsize\sc T}}}\right|/(\left|h\right|+\left|h^{{\mbox{\scriptsize\sc T}}}\right|)$ , with $|h|$ the Euclidian norm of the vector $h$ (although good training performance can be achieved with different metrics). The choice of an adaptive-gain feedback procedure frees from hyper-parameter optimization of the time-course of $\kappa\left(t\right)$ , which is usually taken to be a decreasing function of time. It is also instrumental in providing a minimal supervisory signal, thus allowing the student network to progressively exploit its own fluctuations over the course of training to build stability around the target trajectory. More generally, other kinds of constrained optimization methods, e.g. interior point methods, tend to work well when coupled with the feedback mechanism.

Testing

Test error is computed over a testing period $T_{test}$ as

[TABLE]

For input-output tasks, we randomly initialize the network state at the beginning of a test trial. For periodic targets $F^{{\mbox{\scriptsize out}}}\left(t\right)$ , testing is interleaved with training, so that the spiking (rate) network state $\boldsymbol{s}$ ( $\boldsymbol{x}$ ) is usually close to the target trajectory. In this case, a sufficiently low test error usually implies the presence of a stable limit cycle, and the periodic output is reproduced, up to a phase shift, starting from any initial condition.

For the XOR task, during testing we defined a correct response when the normalized dot product of the readout $z$ and $F^{{\mbox{\scriptsize out}}}$ , with $t$ in the window of non-zero target, satisfied

[TABLE]

Acknowledgements.

We thank Laureline Logiaco, Fabio Stefanini and R. Engelken for fruitful discussions. Research supported by NSF NeuroNex Award DBI-1707398, the Gatsby Charitable Foundation, and the Simons Collaboration for the Global Brain.

Bibliography35

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] C. van Vreeswijk and H. Sompolinsky. Chaos in neuronal networks with balanced excitatory and inhibitory activity. Science , 274(5293):1724–1726, 1996.
2[2] C. van Vreeswijk and H. Sompolinsky. Chaotic balanced state in a model of cortical circuits. Neural Comput. , 10(6):1321–1371, August 1998.
3[3] Alfonso Renart, Jaime de la Rocha, Peter Bartho, Liad Hollender, Néstor Parga, Alex Reyes, and Kenneth D. Harris. The asynchronous state in cortical circuits. Science , 327(5965):587–590, 2010.
4[4] Jonathan Kadmon and Haim Sompolinsky. Transition to chaos in random neuronal networks. Phys. Rev. X , 5:041030, Nov 2015.
5[5] Omri Harish and David Hansel. Asynchronous rate chaos in spiking neuronal circuits. PLOS Computational Biology , 11(7):1–38, 07 2015.
6[6] Francesca Mastrogiuseppe and Srdjan Ostojic. Intrinsically-generated fluctuating activity in excitatory-inhibitory networks. PLOS Computational Biology , 13(4):1–40, 04 2017.
7[7] M V Tsodyks and T Sejnowski. Rapid state switching in balanced cortical network models. Network: Computation in Neural Systems , 6(2):111–124, 1995.
8[8] Nicolas Brunel. Dynamics of sparsely connected networks of excitatory and inhibitory spiking neurons. Journal of Computational Neuroscience , 8(3):183–208, May 2000.