Optimal decoding of dynamic stimuli encoded by heterogeneous populations   of spiking neurons - a closed form approximation

Yuval Harel; Ron Meir; Manfred Opper

arXiv:1901.04094·q-bio.NC·January 15, 2019

Optimal decoding of dynamic stimuli encoded by heterogeneous populations of spiking neurons - a closed form approximation

Yuval Harel, Ron Meir, Manfred Opper

PDF

Open Access

TL;DR

This paper introduces an analytically tractable Bayesian approximation for optimal neural decoding of dynamic stimuli, enabling insights into sensory encoding strategies in heterogeneous neural populations.

Contribution

It develops a closed-form approximation for optimal filtering in neural decoding, allowing analysis of large, non-uniform populations with reduced computational complexity.

Findings

01

The approximation closely matches particle filtering results.

02

It provides new insights into optimal encoding in heterogeneous populations.

03

The framework aligns with biological observations of sensory cell distributions.

Abstract

Neural decoding may be formulated as dynamic state estimation (filtering) based on point process observations, a generally intractable problem. Numerical sampling techniques are often practically useful for the decoding of real neural data. However, they are less useful as theoretical tools for modeling and understanding sensory neural systems, since they lead to limited conceptual insight about optimal encoding and decoding strategies. We consider sensory neural populations characterized by a distribution over neuron parameters. We develop an analytically tractable Bayesian approximation to optimal filtering based on the observation of spiking activity, that greatly facilitates the analysis of optimal encoding in situations deviating from common assumptions of uniform coding. Continuous distributions are used to approximate large populations with few parameters, resulting in a filter…

Tables4

Table 1. Table 1: Approximation errors relative to PF in the examples of Figures 6 and 8 .

	$h = 1000$		$h = 2$
	$ϵ_{μ}$	$ϵ_{σ}$	$ϵ_{μ}$	$ϵ_{σ}$
median	-0.00272	$1.29 \times 10^{- 4}$	$- 2.84 \times 10^{- 4}$	$2.96 \times 10^{- 4}$
5th perc.	-0.0601	-0.0185	-0.0184	-0.0245
95th perc.	0.0482	0.0192	0.0186	0.0178
mean	-0.00415	$1.41 \times 10^{- 4}$	$3.34 \times 10^{- 4}$	$- 9.35 \times 10^{- 4}$
std. dev.	0.0345	0.0126	0.0119	0.0122
med. abs. value	0.0188	0.00722	0.00662	0.00766
mean abs. value	0.0251	0.00919	0.0086	0.00942

Table 2. (a) 1d example (Figure 6 )

	$h = 1000$		$h = 2$
	$ϵ_{μ}$	$ϵ_{σ}$	$ϵ_{μ}$	$ϵ_{σ}$
median	-0.00272	$1.29 \times 10^{- 4}$	$- 2.84 \times 10^{- 4}$	$2.96 \times 10^{- 4}$
5th perc.	-0.0601	-0.0185	-0.0184	-0.0245
95th perc.	0.0482	0.0192	0.0186	0.0178
mean	-0.00415	$1.41 \times 10^{- 4}$	$3.34 \times 10^{- 4}$	$- 9.35 \times 10^{- 4}$
std. dev.	0.0345	0.0126	0.0119	0.0122
med. abs. value	0.0188	0.00722	0.00662	0.00766
mean abs. value	0.0251	0.00919	0.0086	0.00942

Table 3. (b) 2d example (Figure 8 )

	position		velocity
	$ϵ_{μ}$	$ϵ_{σ}$	$ϵ_{μ}$	$ϵ_{σ}$
median	$- 1.32 \times 10^{- 4}$	$2.28 \times 10^{- 4}$	$- 3.37 \times 10^{- 4}$	$- 1.53 \times 10^{- 4}$
5th perc.	-0.0337	-0.0253	-0.0234	-0.0148
95th perc.	0.0361	0.0257	0.0258	0.0154
mean	-0.00101	$2.95 \times 10^{- 5}$	$1.51 \times 10^{- 5}$	$2.95 \times 10^{- 5}$
std. dev.	0.0236	0.0157	0.0169	0.00922
med. abs. value	0.0115	0.00920	0.00908	0.00564
mean abs. value	0.0163	0.0118	0.0121	0.00711

Table 4. Table 2: Total population rates r ( x ) 𝑟 𝑥 r\left(x\right) and mark sampling distributions κ ( x ; d θ ) 𝜅 𝑥 𝑑 𝜃 \kappa\left(x;d\theta\right) for the preferred stimulus distributions f ( d θ ) 𝑓 𝑑 𝜃 f\left(d\theta\right) of section 3.2.3 with Gaussian tuning λ ( x ; θ ) = h exp ⁡ ( − 1 2 ‖ H x − θ ‖ R 2 ) 𝜆 𝑥 𝜃 ℎ 1 2 superscript subscript norm 𝐻 𝑥 𝜃 𝑅 2 \lambda\left(x;\theta\right)=h\exp\left(-\frac{1}{2}\left\|Hx-\theta\right\|_{R}^{2}\right) . The derivations of these closed forms is straightforward for the Dirac and uniform distributions; the derivation for Gaussian distribution is by multiplication of Gaussians, completely paralleling the computations in section A.4 .

$f (d θ)$	$r (x)$	$κ (x; d θ)$
$δ_{θ_{0}} (d θ)$	$λ (x; θ_{0})$	$δ_{θ_{0}} (d θ)$
$d θ$	$h \sqrt{\frac{{(2 π)}^{m}}{det (R)}}$	$𝒩 (θ; H x, R^{- 1}) d θ$
$𝒩 (θ; c, G) d θ$	$h \sqrt{\frac{{(2 π)}^{m}}{det (R)}} 𝒩 (c; H x, R^{- 1} + G)$	$𝒩 (θ; G R_{G} H x + R^{- 1} R_{G} c, {(R + G^{- 1})}^{- 1}) d θ,$
$𝒩 (θ; c, G) d θ$		where $R_{G} = {(R^{- 1} + G)}^{- 1}$
$𝟏 {a \leq θ \leq b} d θ$	$h \sqrt{\frac{2 π}{R}} [Φ (z_{b} (x)) - Φ (z_{a} (x))]$ ,	$𝒩_{[a, b]} (θ; H x, R^{- 1}) d θ$
$𝟏 {a \leq θ \leq b} d θ$	where $z_{s} (x) = \sqrt{R} (s - H x)$	(truncated normal distribution)

Equations346

\dot{X}_{t} = A (X_{t}) + D (X_{t}) ξ_{t},

\dot{X}_{t} = A (X_{t}) + D (X_{t}) ξ_{t},

X_{(k + 1) Δ t} = X_{k Δ t} + A (X_{k Δ t}) Δ t + D (X_{k Δ t}) ξ_{k} Δ t,

X_{(k + 1) Δ t} = X_{k Δ t} + A (X_{k Δ t}) Δ t + D (X_{k Δ t}) ξ_{k} Δ t,

d X_{t} = A (X_{t}) d t + D (X_{t}) d W_{t}, (t \geq 0),

d X_{t} = A (X_{t}) d t + D (X_{t}) d W_{t}, (t \geq 0),

\displaystyle\mathbf{P}\left(N_{t+h}^{i}-N_{t}^{i}=k\Big{|}X_{\left[0,t\right]},\mathcal{N}_{t}\right)=

\displaystyle\mathbf{P}\left(N_{t+h}^{i}-N_{t}^{i}=k\Big{|}X_{\left[0,t\right]},\mathcal{N}_{t}\right)=

(i = 1, \dots M, h \to 0^{+})

λ^{i} (x) = h_{i} exp (- \frac{1}{2} ∥ H_{i} x - θ_{i} ∥_{R_{i}}^{2}),

λ^{i} (x) = h_{i} exp (- \frac{1}{2} ∥ H_{i} x - θ_{i} ∥_{R_{i}}^{2}),

\int_{a}^{b} h (t) d N_{t}^{i} ≜ j \sum 1 {t_{j}^{i} \in [a, b]} h (t_{j}^{i}),

\int_{a}^{b} h (t) d N_{t}^{i} ≜ j \sum 1 {t_{j}^{i} \in [a, b]} h (t_{j}^{i}),

\hat{λ}_{t}^{i} = E_{t}^{N} [λ^{i} (X_{t})] = \int p_{t}^{N} (x) λ^{i} (x) d x .

\hat{λ}_{t}^{i} = E_{t}^{N} [λ^{i} (X_{t})] = \int p_{t}^{N} (x) λ^{i} (x) d x .

d p_{t}^{N} (x) = {L^{*} p_{t}^{N}} (x) d t + p_{t}^{N} (x) i \sum (\frac{λ ^{i} ( x )}{λ ^ _{t}^{i}} - 1) (d N_{t}^{i} - \hat{λ}_{t}^{i} d t),

d p_{t}^{N} (x) = {L^{*} p_{t}^{N}} (x) d t + p_{t}^{N} (x) i \sum (\frac{λ ^{i} ( x )}{λ ^ _{t}^{i}} - 1) (d N_{t}^{i} - \hat{λ}_{t}^{i} d t),

\frac{\partial}{\partial t} p_{t}^{N} (x) = {L^{*} p_{t}^{N}} (x) + p_{t}^{N} (x) i \sum (\frac{λ ^{i} ( x )}{λ ^ _{t}^{i}} - 1) (\dot{N}_{t}^{i} - \hat{λ}_{t}^{i}),

\frac{\partial}{\partial t} p_{t}^{N} (x) = {L^{*} p_{t}^{N}} (x) + p_{t}^{N} (x) i \sum (\frac{λ ^{i} ( x )}{λ ^ _{t}^{i}} - 1) (\dot{N}_{t}^{i} - \hat{λ}_{t}^{i}),

d μ_{t}

d μ_{t}

d Σ_{t}

+ i \sum E_{t^{-}}^{N} [ω_{t^{-}}^{i} \tilde{X}_{t^{-}} \tilde{X}_{t^{-}}^{⊺}] (d N_{t}^{i} - \hat{λ}_{t}^{i} d t)

- i \sum E_{t^{-}}^{N} [ω_{t^{-}}^{i} X_{t^{-}}] E_{t^{-}}^{N} [ω_{t^{-}}^{i} X_{t^{-}}^{⊺}] d N_{t}^{i}

ω_{t}^{i} ≜ \frac{λ ^{i} ( X _{t} )}{λ ^ _{t}^{i}} - 1,

ω_{t}^{i} ≜ \frac{λ ^{i} ( X _{t} )}{λ ^ _{t}^{i}} - 1,

d μ_{t}

d μ_{t}

d Σ_{t}

d μ_{t}^{π}

d μ_{t}^{π}

d Σ_{t}^{π}

d μ_{t}^{c}

d Σ_{t}^{c}

d μ_{t}^{N}

d Σ_{t}^{N}

d μ_{t}^{π}

d μ_{t}^{π}

d Σ_{t}^{π}

d μ_{t}^{c}

d μ_{t}^{c}

d Σ_{t}^{c}

d μ_{t}^{N}

d Σ_{t}^{N}

\hat{λ}_{t}^{i}

δ_{t}^{i}

S_{t}^{i}

d Σ_{t}^{- 1, c}

d Σ_{t}^{- 1, c}

d Σ_{t}^{- 1, N}

d μ_{t}^{c}

d μ_{t}^{c}

d σ_{t}^{2, c}

d σ_{t}^{- 2, c}

d μ_{t}^{N}

d σ_{t}^{2, N}

d σ_{t}^{- 2, N}

\hat{λ}_{t}^{i}

λ^{i} (x) = h_{i} exp (- \frac{1}{2} x - \overset{ˉ}{θ}_{i}_{H_{i}^{⊺} R_{i} H_{i}}^{2}) .

λ^{i} (x) = h_{i} exp (- \frac{1}{2} x - \overset{ˉ}{θ}_{i}_{H_{i}^{⊺} R_{i} H_{i}}^{2}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural dynamics and brain function · Neural Networks and Applications · Advanced Memory and Neural Computing

Full text

\RS@ifundefined

subsecref \newrefsubsecname = \RSsectxt

\RS@ifundefinedthmref \newrefthmname = theorem

\RS@ifundefinedlemref \newreflemname = lemma

Optimal decoding of dynamic stimuli encoded by heterogeneous populations

of spiking neurons – a closed form approximation

Yuval Harel

Department of Electrical Engineering, Technion – Israel Institute of Technology, Haifa, Israel

Ron Meir

Department of Electrical Engineering, Technion – Israel Institute of Technology, Haifa, Israel

Manfred Opper

Department of Electrical Engineering and Computer Science, Technical University Berlin, Berlin 10587, Germany

Abstract

Neural decoding may be formulated as dynamic state estimation (filtering) based on point process observations, a generally intractable problem. Numerical sampling techniques are often practically useful for the decoding of real neural data. However, they are less useful as theoretical tools for modeling and understanding sensory neural systems, since they lead to limited conceptual insight about optimal encoding and decoding strategies. We consider sensory neural populations characterized by a distribution over neuron parameters. We develop an analytically tractable Bayesian approximation to optimal filtering based on the observation of spiking activity, that greatly facilitates the analysis of optimal encoding in situations deviating from common assumptions of uniform coding. Continuous distributions are used to approximate large populations with few parameters, resulting in a filter whose complexity does not grow with the population size, and allowing optimization of population parameters rather than individual tuning functions. Numerical comparison with particle filtering demonstrates the quality of the approximation. The analytic framework leads to insights which are difficult to obtain from numerical algorithms, and is consistent with biological observations about the distribution of sensory cells’ preferred stimuli.

Published in Neural Computation, August 2018, Vol. 30, No. 8

1 Introduction

Populations of sensory neurons encode information about the external world through their spiking activity. To understand this encoding, it is natural to model it as an optimal or near-optimal code in the context of some task performed by higher brain regions, using performance criteria such as decoding error or motor performance. A Bayesian theory of neural decoding is useful to characterize optimal encoding, as the computation of performance criteria typically involves the posterior distribution of the world state conditioned on spiking activity.

We model the external world state as a random process, observed through a set of sensory neuron-like elements characterized by multi-dimensional tuning functions, representing the elements’ average firing rate (see Figure 1). The actual firing of each cell is random and is given by a Point Process (PP) with rate determined by the external state and by the cell’s tuning function (Dayan \BBA Abbott, \APACyear2005). Under this model, decoding of sensory spike trains may be formulated as a filtering problem based on PP observations, thus falling within the purview of nonlinear filtering theory ((Snyder \BBA Miller, \APACyear1991), (Brémaud, \APACyear1981)). Inferring the hidden state under such circumstances has been widely studied within the Computational Neuroscience literature (Dayan \BBA Abbott, \APACyear2005; Macke \BOthers., \APACyear2015). Beyond neuroscience, PP-based filtering has been used for position sensing and tracking in optical communication (Snyder \BOthers., \APACyear1977, sec. 4), control of computer communication networks (Segall, \APACyear1978), queuing (Brémaud, \APACyear1981) and econometrics (Frey \BBA Runggaldier, \APACyear2001).

A significant amount of work has been devoted in recent years to the development of algorithms for fast approximation of the posterior distribution, leading to an extensive literature (see (Macke \BOthers., \APACyear2015) and refs within for a recent review). Much of this work is devoted to the development of effective sampling techniques, leading to highly performing finite-dimensional filters that can be applied profitably to real neural data. These approaches are usually formulated in discrete time, as befits implementation on digital computers, and lead to complex mathematical expressions for the posterior distributions, which are difficult to interpret qualitatively. In this work we are less concerned with algorithmic issues, and more with establishing closed-form analytic expressions for approximately optimal continuous time filters, and using these to characterize the nature of near-optimal encoders, namely determining the structure and distribution of tuning functions for optimal state inference. A significant advantage of the closed form expressions over purely numerical techniques is the insight and intuition that is gained from them about qualitative aspects of the system. Moreover, the leverage gained by the analytic computation contributes to reducing the variance inherent to Monte Carlo approaches. Thus, in this work we do not compare our results to algorithmically oriented discrete-time filters, but rather to other continuous-time analytically expressible filters for dynamically varying signals, with the aim of gaining insight about optimal decoding and encoding within an analytic framework.

The problem of filtering a continuous-time diffusion process through PP observations is solved formally under general conditions in (Snyder, \APACyear1972) (see also (Segall, \APACyear1976) and (Solo, \APACyear2000)), where a stochastic PDE for the infinite-dimensional posterior state distribution is derived. However, this PDE is intractable in general, and not easily amenable to qualitative or even numerical analysis. Several previous works have derived exact or approximate finite-dimensional filters, under various simplifying assumptions. In many of these works (e.g. (Rhodes \BBA Snyder, \APACyear1977; Komaee, \APACyear2010; Yaeli \BBA Meir, \APACyear2010; Susemihl \BOthers., \APACyear2013; Twum-Danso \BBA Brockett, \APACyear2001)), the tuning functions are chosen so that the total firing rate — i.e., the sum of firing rates of all neurons — is independent of the state, an assumption we refer to as uniform coding (see Figure 2). In (Rhodes \BBA Snyder, \APACyear1977), an exact finite-dimensional filter is derived for the case of linear dynamics with Gaussian noise observed through uniform coding with Gaussian tuning functions111Although we describe this work using neuroscience terminology, the motivation and formulation in (Rhodes \BBA Snyder, \APACyear1977) is not related to neuroscience.. The more general setting of uniform coding with arbitrary tuning functions is considered in (Komaee, \APACyear2010), where an approximate filter is obtained.

Other works derive the posterior distribution for non-Markovian state dynamics, modeled as a Gaussian processes. In (Huys \BOthers., \APACyear2007), the posterior is derived exactly, but its computation is not recursive, requiring memory of the entire spike history. A recursive version for Gaussian processes with a Matérn kernel auto-correlation is derived in (Susemihl \BOthers., \APACyear2011). Both these works assume uniform coding with Gaussian tuning functions.

For reasons of mathematical tractability, few previous analytically oriented works studied neural decoding without the uniform coding assumption, in spite of the experimental importance and relevance of non-uniform coding. We discuss such works in comparison to the present work in section 6

The problem of optimal encoding by neural populations has been studied mostly in the static case. A natural optimality criterion is the estimation Mean Square Error (MSE). Some works (e.g. (Harper \BBA McAlpine, \APACyear2004), (Ganguli \BBA Simoncelli, \APACyear2014), and many others) optimize Fisher information, which serves as a proxy to the MSE of unbiased estimators through the Cramér-Rao bound (Radhakrishna Rao, \APACyear1945) or, in the Bayesian setting, the Van Trees inequality (Gill \BBA Levit, \APACyear1995). Fisher information of neural spiking activity is easy to compute analytically, at least in the static case (Dayan \BBA Abbott, \APACyear2005, section 3.3), and it can be used without solving the decoding problem. This approach has been used to study non-uniform coding of static stimuli by heterogeneous populations in many works, including (Chelaru \BBA Dragoi, \APACyear2008; Ecker \BOthers., \APACyear2011; Ganguli \BBA Simoncelli, \APACyear2014). However, optimizing Fisher information may yield misleading qualitative results regarding the MSE-optimal encoding (Bethge \BOthers., \APACyear2002; Yaeli \BBA Meir, \APACyear2010; Pilarski \BBA Pokora, \APACyear2015). Although, under appropriate conditions, the inverse of Fisher information approaches the minimum attainable MSE in the limit of infinite decoding time, it may be a poor proxy for the MSE for finite decoding times, which are of particular importance in natural settings and in control problems. Exact computation of the estimation MSE is possible in some restricted settings: some works along those lines are discussed in Section 6.2.

A possible alternative is the computation of estimation MSE for a given filter through Monte Carlo simulations. This approach is complicated by high variability between trials, which means many trials are necessary for each value of the parameters. Consequently, optimization becomes very time consuming, possibly impractical when using numerical filters such as particle filtering, or large neural populations with many parameters.

In this work, we derive an approximate online filter for the neural decoding problem in continuous time, and demonstrate its use in investigating optimal neural encoding. We consider neural populations characterized by a distribution over neuron parameters. Continuous distributions are used to approximate large populations with few parameters, resulting in a filter whose complexity does not grow with the population size, and allowing optimization of population parameters rather than individual tuning functions. We suggest further reducing computational complexity for the encoding problem by using the estimated posterior variance as an approximation to estimation MSE, as discussed in appendix C.

Technically, given the intractable infinite-dimensional nature of the posterior distribution, we use a projection method replacing the full posterior at each point in time by a projection onto a simple family of distributions (Gaussian in our case). This approach, originally developed in the Filtering literature (Maybeck, \APACyear1979; Brigo \BOthers., \APACyear1999), and termed Assumed Density Filtering (ADF), has been successfully used more recently in Machine Learning (Opper, \APACyear1998; Minka, \APACyear2001). We derive approximate filters for Gaussian tuning functions, and for several distributions over tuning function centers, including the case of a finite population. These filters may be combined to obtain a filter for heterogeneous mixtures of homogeneous sub-populations. We are not aware of any previous work providing an effective closed-form filter for heterogeneous populations of sensory neurons characterized by a small number of parameters.

Main contributions: (i) Derivation of closed-form recursive expressions for the continuous-time posterior mean and variance within the ADF approximation, in the context of large *non-uniform populations *characterized by a small number of parameters. (ii) Demonstrating the quality of the ADF approximation by comparison to state-of-the-art particle filtering methods. (iii) Characterization of optimal adaptation (encoding) for sensory cells in a more general setting than hitherto considered (non-uniform coding, dynamic signals). (iv) Demonstrating the interesting interplay between prior information and neuronal firing, showing how in certain situations, the absence of spikes can be highly informative (this phenomenon is absent under uniform coding).

Preliminary results discussed in this paper were presented at a conference (Harel \BOthers., \APACyear2015). These included only the special case of Gaussian distribution of preferred stimuli. The present paper provides a more general and rigorous formulation of the mathematical framework. By separately considering different terms in the approximate filter, we find that updates at spike time depend only on the tuning function of the spiking neuron, and apply generally to any population distribution. For the updates between spikes, we provide closed-form expressions for cases that were not discussed in (Harel \BOthers., \APACyear2015) – namely, uniform populations on an interval and finite heterogeneous mixtures – and non-closed-form expressions for the general case, given as integrals involving the distribution of neuron parameters. We further supplement our previously published results with numerical evaluation of the filter’s accuracy, an additional example application, and a detailed comparison with previous works.

2 Problem Overview

Consider a dynamical system with state $X_{t}\in\mathbb{R}^{n}$ , observed through the firing patterns of $M$ sensory neurons, as illustrated in Figure 1. Each neuron fires stochastically and independently, with the $i$ th neuron having firing rate $\lambda^{i}\left(X_{t}\right)$ . More detailed assumptions about the dynamics of the state and observation processes are described in later sections. In this context, we are interested in the question of optimal encoding and decoding. By *decoding *we mean computing (exactly or approximately) the full posterior distribution of $X_{t}$ given $\mathcal{N}_{t}$ , which is the history of neural spikes up to time $t$ . The problem of optimal encoding is then the problem of optimal sensory cell configuration, i.e., finding the optimal rate function $\left\{\lambda^{i}\left(\cdot\right)\right\}_{i=1}^{M}$ so as to minimize some performance criterion. We assume the set $\left\{\lambda^{i}\right\}_{i=1}^{M}$ belong to some parameterized family with parameter $\phi$ .

To quantify the performance of the encoding-decoding system, we summarize the result of decoding using a single estimator $\hat{X}_{t}=\hat{X}_{t}\left(\mathcal{N}_{t}\right)$ , and define the Mean Square Error (MSE) as $\epsilon_{t}\triangleq\mathrm{trace}[(X_{t}-\hat{X}_{t})(X_{t}-\hat{X}_{t})^{T}]$ . We seek $\hat{X}_{t}$ and $\phi$ that solve $\min_{\phi}\min_{\hat{X}_{t}}\mathrm{E}\left[\epsilon_{t}\right]=\min_{\phi}\mathrm{E}[\min_{\hat{X}_{t}}\mathrm{E}[\epsilon_{t}|\mathcal{N}_{t}]]$ . The inner minimization problem in this equation is solved by the MSE-optimal decoder, which is the posterior mean $\hat{X}_{t}=\mu_{t}\triangleq\mathrm{E}\left[X_{t}|\mathcal{N}_{t}\right]$ . The posterior mean may be computed from the full posterior obtained by decoding. The outer minimization problem is solved by the optimal encoder. If decoding is exact, the problem of optimal encoding becomes that of minimizing the expected posterior variance. Note that, although we assume a fixed parameter $\phi$ which does not depend on time, the optimal value of $\phi$ for which the minimum is obtained generally depends on the time $t$ where the error is to be minimized. In principle, the encoding/decoding problem can be solved for any value of $t$ . In order to assess performance it is convenient to consider the steady-state limit $t\to\infty$ for the encoding problem.

Below, we approximately solve the decoding problem for any $t$ . We then explore the problem of choosing the steady-state optimal encoding parameters $\phi$ using Monte Carlo simulations in an example motivated by experimental results.

Having an efficient (closed-form) approximate filter allows performing the Monte Carlo simulation at a significantly reduced computational cost, relative to numerical methods such as particle filtering. The computational cost is further reduced by averaging the computed posterior variance across trials, rather than the squared error, thereby requiring fewer trials. The mean of the posterior variance equals the MSE (of the posterior mean), but has the advantage of being less noisy than the squared error itself – since by definition it is the mean of the square error under conditioning on $\mathcal{N}_{t}$ .

3 Decoding

3.1 A Finite Population of Gaussian Neurons

For ease of exposition, we first formulate the problem for a finite population of neurons. We address a more general setting in subsequent sections.

3.1.1 State and observation model

The observed process $X$ is a diffusion process obeying the Stochastic Differential Equation (SDE)222For an introduction to SDEs and the Wiener process see e.g. (Øksendal, \APACyear2003). Intuitively, equation (1) may be interpreted as a differential equation with “continuous-time Gaussian white noise” $\xi_{t}$ :

$\dot{X}_{t}=A\left(X_{t}\right)+D\left(X_{t}\right)\xi_{t},$

or as the limit as $\Delta t\to 0$ of the discretized dynamics

$X_{\left(k+1\right)\Delta t}=X_{k\Delta t}+A\left(X_{k\Delta t}\right)\Delta t+D\left(X_{k\Delta t}\right)\xi_{k}\sqrt{\Delta t},$

where $\xi_{k}$ are independent standard Gaussian variables.

[TABLE]

where $A\left(\cdot\right),D\left(\cdot\right)$ are arbitrary functions such that (1) has a unique333in the sense that any two solutions $X_{t}^{\left(1\right)},X_{t}^{\left(2\right)}$ defined over $\left[0,T\right]$ with $X_{0}^{\left(1\right)}=X_{0}^{\left(2\right)}$ satisfy $\mathbf{P}[X_{t}^{\left(1\right)}=X_{t}^{\left(2\right)}\;\forall t\in\left[0,T\right]]=1$ . See e.g. (Øksendal, \APACyear2003, Theroem 5.2.1) for sufficient conditions. solution, and $W_{t}$ is a standard Wiener process whose increments are independent of the history of all other random processes. The integral with respect to $dW_{t}$ is to be interpreted in the Ito sense. The initial condition $X_{0}$ is assumed to have a continuous distribution with a known density.

The observation processes are $\{N^{i}\}_{i=1}^{M}$ , where $N_{t}^{i}$ is the spike count of the $i$ th neuron up to time $t$ . Denote by $\mathcal{N}_{t}=(N_{\left[0,t\right]}^{i})_{i=1}^{M}$ the history of neural spikes up to time $t$ , and by $N_{t}=\sum_{i=1}^{M}N_{t}^{i}$ the total number of spikes up to time $t$ from all neurons. We assume that the $i$ th neuron fires with rate $\lambda^{i}\left(X_{t}\right)$ at time $t$ , independently of other neurons given the state history $X_{\left[0,t\right]}$ . More explicitly, this means

[TABLE]

where $X_{\left[0,t\right]}$ denote the history up to time $t$ of $X$ , and $o\left(h\right)$ is little-o asymptotic notation, denoting any function satisfying $o\left(h\right)/h\to 0$ as $h\to 0^{+}$ . Thus, each $N^{i}$ is a Doubly-Stochastic Poisson Process (DSPP, see e.g., (Snyder \BBA Miller, \APACyear1991))444If (1) includes an additional feedback (control) term, $N^{i}$ are not DSPPs. Our results apply with little modification to this case, as described in appendix A. with rate process $\lambda^{i}\left(X_{t}\right)$ .

To achieve mathematical tractability, we assume that the tuning functions $\lambda^{i}$ are Gaussian: the firing rate of the $i$ th neuron in response to state $x$ is given by

[TABLE]

where $\theta_{i}\in\mathbb{R}^{n}$ is the neuron’s preferred location, $h_{i}\in\mathbb{R}_{+}$ is the neuron’s maximal expected firing rate, $H_{i}\in\mathbb{R}^{m\times n}$ and $R_{i}\in\mathbb{R}^{m\times m}$ , $m\leq n$ , are fixed matrices, each $R_{i}$ is positive-definite, and the notation $\left\|y\right\|_{M}^{2}$ denotes $y^{T}My$ . The inclusion of the matrix $H_{i}$ allows using high-dimensional models where only some dimensions are observed, for example when the full state includes velocities but only locations are directly observable. In typical applications, $H_{i}$ would be the same across all neurons, or at least across all neurons of the same sensory modality.

In the sequel, we use the following standard notation,

[TABLE]

for any function $h$ , where $t_{j}^{i}$ is the time of the $j$ th point of the process $N^{i}$ . This is the usual Lebesgue integral of $h$ with respect to $N^{i}$ viewed as a discrete measure.

3.1.2 Model limitations

The model outlined above involves several simplifications to achieve tractability. Namely, tuning functions are assumed to be Gaussian, and firing rates are assumed to be independent of state history and spike history given the current state (2), yielding a DSPP model (see footnote 4 above).

Gaussian tuning functions are a reasonable model for some neural systems, but are inadequate for others – e.g., where the tuning is sigmoidal or where there is a baseline firing rate regardless of stimulus value. For simplicity, we focus on the Gaussian case in this work. It is straightforward to extend the derivation presented here to piecewise linear tuning – which may be used to represent sigmoidal tuning functions – but the resulting expression are more cumbersome. We have also developed closed-form results for tuning functions given by sums of Gaussians; however, these require further approximations in order to obtain analytic results, and are not discussed in this work.

The assumption of history-independent rates may also limit the model’s applicability. Real sensory neurons exhibit firing-history dependence in the form of refractory periods and rate adaptation (Dayan \BBA Abbott, \APACyear2005), state-history dependence such as input integration (Dayan \BBA Abbott, \APACyear2005), or correlations between the firing of different neurons conditioned on the state (Pillow \BOthers., \APACyear2008). These phenomena are captured by some encoding models, such as simple integrate-and-fire models as well as more complex physiological models like the Hodgkin-Huxley model. However, the simplifying independence assumptions above are common to all works presenting closed-form continuous-time filters for point process observations that we are aware of.

Note that characterization of the point processes in terms of their history-conditioned firing rate, as opposed to finite-dimensional distributions, does not in itself restrict the model’s generality in any substantial way (see (Segall \BBA Kailath, \APACyear1975, Theorem 1)). Rather, the independence assumptions are expressed rigorously by the fact that the right-hand side of (2) depends neither on previous values of $X$ , nor on previous spike times of any neuron. Some of our analysis applies without modification when rates are allowed to depend on spiking history (specifically, equations (6) below), so it may be possible to extend these techniques to some history-dependent models. However, when rates may depend on the state history, exact filtering may involve the posterior distribution of the entire state history rather than the current state, so that a different approach is probably required.

3.1.3 Exact filtering equations

Let $p_{t}^{\,\mathcal{N}}\left(\cdot\right)$ be the posterior density of $X_{t}$ given the firing history $\mathcal{N}_{t}$ , and $\mathbf{E}_{t}^{\mathcal{N}}\left[\cdot\right]$ the posterior expectation given $\mathcal{N}_{t}$ . The prior density $p_{0}^{\,\mathcal{N}}$ is assumed to be known. We denote by $\hat{\lambda}_{t}^{i}$ the rate of $N_{t}^{i}$ with respect to the history of spiking only – i.e., the rates that would appear in the right-hand side of (2) if the conditioning on the left were only on $\mathcal{N}_{t}$ . These rates are given by555See (Segall \BBA Kailath, \APACyear1975, Theorem 2)

[TABLE]

The problem of filtering a diffusion process $X$ from a doubly stochastic Poisson process driven by $X$ is formally solved in (Snyder, \APACyear1972), where the authors derive a stochastic PDE for the posterior density666the setting of (Snyder, \APACyear1972) includes a single observation point process. The extension to several point processes is obtained through summation as in (5), and is a special case of a more general PDE described in (Rhodes \BBA Snyder, \APACyear1977) and discussed in Appendix A.,

[TABLE]

where $\mathcal{L}$ is the state’s infinitesimal generator (Kolmogorov’s backward operator), defined as $\mathcal{L}h\left(x\right)=\lim_{\Delta t\to 0^{+}}\left(\mathrm{E}\left[h\left(X_{t+\Delta t}\right)|X_{t}=x\right]-h\left(x\right)\right)/\Delta t$ , $\mathcal{L}^{*}$ is $\mathcal{L}$ ’s adjoint operator (Kolmogorov’s forward operator). The notation $dN_{t}^{i}$ is interpreted as in (4), so this term contributes a jump of size $p_{t}^{\,\mathcal{N}}\left(x\right)\left(\lambda^{i}\left(x\right)/\hat{\lambda}_{t}^{i}-1\right)$ at a spike of the $i$ th neuron. Equation (5) may be written in a notation more familiar for non-stochastic PDEs using Dirac delta functions,

[TABLE]

where $\dot{N}_{t}^{i}\triangleq\sum_{j}\delta\left(t-t_{j}^{i}\right)$ is the spike train of the $i$ th neuron, which is the formal derivative of the process $N^{i}$ . An accessible, albeit non-rigorous, derivation of (5) via time discretization is found in (Susemihl, \APACyear2014, section 2.3).

The stochastic PDE (5) is non-linear and non-local (due to the dependence of $\hat{\lambda}_{t}^{i}$ on $p_{t}^{\,\mathcal{N}}$ ), and therefore usually intractable. In (Rhodes \BBA Snyder, \APACyear1977; Susemihl \BOthers., \APACyear2014) the authors consider linear dynamics with a Gaussian prior and Gaussian sensors with centers distributed uniformly over the state space. In this case, the posterior is Gaussian, and (5) leads to closed-form ODEs for its mean and variance. In our more general setting, we can obtain exact equations for the posterior mean and variance, as follows.

Let $\mu_{t}\triangleq\mathbf{E}_{t}^{\mathcal{N}}X_{t},\tilde{X}_{t}\triangleq X_{t}-\mu_{t},\Sigma_{t}\triangleq\mathbf{E}_{t}^{\mathcal{N}}[\tilde{X}_{t}\tilde{X}_{t}^{T}]$ . Using (5), along with known results about the form of the infinitesimal generator $\mathcal{L}_{t}$ for diffusion processes (e.g. (Øksendal, \APACyear2003), Theorem 7.3.3), the first two posterior moments can be shown to obey the following exact equations (see Appendix A):

[TABLE]

where

[TABLE]

and the expressions involving $t^{-}$ denote left limits, which are necessary since the solutions to (6) are discontinuous at spike times.

In contrast with the more familiar case of linear dynamics with Gaussian white noise, and the corresponding Kalman-Bucy filter (Maybeck, \APACyear1979), here the posterior variance is random, and is generally not monotonically decreasing even when estimating a constant state. However, noting that $\mathbf{E}[dN_{t}^{i}-\hat{\lambda}_{t}^{i}dt]=0$ , we may observe from (6b) that for a constant state ( $A=D=0$ ), the expected posterior variance $\mathbf{E}\left[\Sigma_{t}\right]$ is decreasing, since the first two terms in (6b) vanish.

We will find it useful to rewrite (6) in a different form, as follows,

[TABLE]

where $d\mu_{t}^{\pi},d\Sigma_{t}^{\pi}$ are the prior terms,* *corresponding to $\mathcal{L}^{*}p_{t}^{\,\mathcal{N}}\left(x\right)$ in (5), and the remaining terms are divided into continuous update terms $d\mu_{t}^{\mathrm{c}},d\Sigma_{t}^{\mathrm{c}}$ (multiplying $dt$ ) and discontinuous update terms $d\mu_{t}^{N},d\Sigma_{t}^{N}$ (multiplying $dN_{t}^{i}$ ). Using (6), we find the exact equations

[TABLE]

The prior terms $d\mu_{t}^{\pi},d\Sigma_{t}^{\pi}$ represent the known dynamics of $X$ , and are the same terms appearing in the Kalman-Bucy filter. These would be the only terms left if no measurements were available, and would vanish for a static state. The continuous update terms $d\mu_{t}^{\mathrm{c}},d\Sigma_{t}^{\mathrm{c}}$ represent updates to the posterior between spikes that are not derived from $X$ ’s dynamics, and therefore may be interpreted as corresponding to information obtained from the absence of spikes. The discontinuous update terms $d\mu_{t}^{N},d\Sigma_{t}^{N}$ contribute a change to the posterior at spike times, depending on the spike’s origin $i$ , and thus represent information obtained from the presence of a spike as well as the parameters of the spiking neuron.

Note that the Gaussian tuning assumption (3) has not been used in this section, and equations (7) are valid for any form of $\lambda_{i}$ .

3.1.4 ADF approximation

While equations (7) are exact, they are not practical, since they require computation of posterior expectations $\mathbf{E}_{t}^{\mathcal{N}}\left[\cdot\right]$ . To bring them to a closed form, we use ADF with an assumed Gaussian density (see (Opper, \APACyear1998) for details). Informally, this may be envisioned as integrating (7) while replacing the distribution $p_{t}^{\,\mathcal{N}}$ by its approximating Gaussian “at each time step”. The approximating Gaussian is obtained by matching the first two moments of $p_{t}^{\,\mathcal{N}}$ (Opper, \APACyear1998). Note that the solution of the resulting equations does not in general match the first two moments of the exact solution, though it may approximate it. Practically, the ADF approximation amounts to substituting the normal distribution $\mathcal{N}(\mu_{t},\Sigma_{t})$ for $p_{t}^{\,\mathcal{N}}$ to compute the expectations in (7). This heuristic may be justified by its relation to a projection method, where right-hand side of the density PDE is projected onto the tangent space of the approximating family of densities: the two approaches are equivalent when the approximating family is exponential (Brigo \BOthers., \APACyear1999).

If the dynamics are linear, the prior updates (7a)-(7b) are easily computed in closed form after this substitution. Specifically, for $dX_{t}=AX_{t}dt+DdW_{t}$ , the prior updates read

[TABLE]

as in the Kalman-Bucy filter (Maybeck, \APACyear1979). Otherwise, they may be approximated by expanding the non-linear functions $A(x)$ and $D\left(x\right)D\left(x\right)^{\intercal}$ as power series and applying the assumed Gaussian density, resulting in tractable Gaussian integrals. The use of ADF in the prior terms is outside the scope of this work; see e.g. (Maybeck, \APACyear1979, Chapter 12).

We therefore turn to the approximation of the non-prior updates (7c)-(7f) in the case of Gaussian tuning (3).

Abusing notation, from here on we use $\mu_{t},\Sigma_{t}$ , and $p_{t}^{\,\mathcal{N}}\left(x\right)$ to refer to the ADF approximation rather than to the exact values. Applying the Gaussian ADF approximation $p_{t}^{\,\mathcal{N}}\left(x\right)\approx\mathcal{N}\left(x;\mu_{t},\Sigma_{t}\right)$ in the case of Gaussian tuning functions (3) yields the non-prior terms

[TABLE]

These equations are a special case of (16), which are derived in appendix A. The updates for the posterior precision $\Sigma_{t}^{-1}$ have a simpler form, also derived in appendix A:

[TABLE]

In the scalar case $m=n=1$ , with $H=1$ , $\sigma_{t}^{2}=\Sigma_{t},\alpha_{i}^{2}=R_{i}^{-1}$ , the update equations (9), (10) read

[TABLE]

Figure 3 illustrates the filter (11) in a one-dimensional example.

3.1.5 Interpretation

To gain some insight into the filtering equations, we consider the discontinuous updates (9c)-(9d) and continuous updates (9a)-(9b) in some special cases, in reference to the example presented in Figure 3.

Discontinuous updates

Consider the case $H_{i}=I$ . As seen from the discontinuous update equations (9c)-(9d), when the $i$ th neuron spikes, the posterior mean moves towards its preferred location $\theta_{i}$ , and the posterior variance decreases (in the sense that $\Sigma_{t^{+}}-\Sigma_{t^{-}}$ is negative definite), as seen in Figure 3. Neither update depends on $h_{i}$ .

For general $H_{i}\in\mathbb{R}^{m\times n}$ of full row rank, let $H_{i}^{\mathrm{r}}$ be any right inverse of $H_{i}$ and $\bar{\theta}_{i}=H_{i}^{\mathrm{r}}\theta_{i}$ . Note that $H_{i}$ projects the state $X_{t}$ to “perceptual coordinates” employed by the $i$ th neuron; thus $\bar{\theta}_{i}$ may be interpreted as the tuning function center in state coordinate, whereas $\theta_{i}$ is in the neuron’s perceptual coordinates. We may rewrite (3) in state coordinates as

[TABLE]

Now, the updates for a spike of neuron $i$ at time $t$ can be written more intuitively (see appendix A) as

[TABLE]

Thus the new posterior mean is a weighted average of the pre-spike posterior mean and the preferred stimulus in state coordinates. The posterior precision $\Sigma_{t}^{-1}$ increases by $H_{i}^{\intercal}R_{i}H_{i}$ which is the tuning function precision matrix in state coordinates. This may be observed in Figure 3, where the posterior precision increases at each spike time by the fixed amount $R_{i}=\alpha_{i}^{-2}=2$ .

Continuous updates

The continuous mean update equation (9a), contributing between spiking events, also admits an intuitive interpretation, in the case where all neurons share the same shape matrices $H_{i}=H,R_{i}=R$ . In this case, the equation reads

[TABLE]

where $\nu_{t}^{i}\triangleq\hat{\lambda}_{t}^{i}/\sum_{j}\hat{\lambda}_{t}^{j}$ . The normalized rates $\nu_{t}^{i}$ may be interpreted heuristically as the distribution of the next firing neuron’s index $i$ , provided the next spike occurs immediately (Brémaud, \APACyear1981, Section 1, Theorem T15). Thus, the absence of spikes drives the posterior mean away from the expected preferred stimulus of the next spiking neuron. The strength of this effect scales with $\sum_{i}\hat{\lambda}_{t}^{i}$ , which is the total expected rate of spikes given the firing history. This behavior is qualitatively similar to the result obtained in (Bobrowski \BOthers., \APACyear2009) for a finite population of neurons observing a continuous-time finite-state Markov process, where the posterior probability between spikes concentrates on states with lower total firing rate.

This behavior may be observed in Figure (3): when the posterior mean $\mu_{t}$ is near a neuron’s preferred stimulus, it moves away from it between spikes as the next spike is expected from that neuron. Similarly, despite the symmetry of the two neurons’ preferred stimuli relative to the starting estimate $\mu_{0}$ , the posterior mean shifts at the start of the trial towards the preferred stimulus of the second neuron, due to its lower firing rate.

The continuous variance update (9a) consists of the difference of two positive semidefinite terms, and accordingly the posterior variance may increase or decrease between spikes along various directions. In Figure (3), the posterior variance decreases before the first spike, and increases between spikes afterwards.

3.2 Continuous population approximation

3.2.1 Motivation

The filtering equations (9a)-(9f) implement sensory decoding for non-uniform populations. However, their applicability to studying neural encoding in large heterogeneous populations is limited for two closely related reasons. First, the computational cost of the filter is linear in the number of neurons. Second, the size of the parameter space describing the population is also linear in the number of neurons, making optimization of large populations computationally costly. These traits are shared with other filters designed for heterogeneous populations, namely (Eden \BBA Brown, \APACyear2008; Bobrowski \BOthers., \APACyear2009; Twum-Danso \BBA Brockett, \APACyear2001).

To reduce the parameter space and simplify the filter, we approximate large neural populations by an infinite continuous population, characterized by a distribution over neuron parameters. These distributions are described by few parameters, resulting in a filter whose complexity does not grow with the population size, and allowing optimization of population parameters rather than individual tuning functions.

For example, consider a homogeneous population of neurons with preferred stimuli equally spaced on an interval, as depicted in Figure 4(a). If the population is large, its firing pattern statistics may be modeled as an infinite population of neurons, with preferred stimuli uniformly distributed on the same interval, as in Figure 4(c). In this continuous population model, each spike is characterized by the preferred stimulus of the firing neuron – which is a continuous variable – rather than by the neuron’s index. Such a population is parameterized by only two variables representing the endpoints of the interval, in addition to the tuning function height and width parameters. There is no need for a parameter representing the density of neurons on the interval, as scaling the density of neurons is equivalent to identical scaling of each neuron’s maximum firing rate.

3.2.2 Marked point processes as continuous population models

We now change the mathematical formulation and notation of our model to accommodate parameterized continuous populations. The new formulation is more general, and includes finite populations as a special case. Rather than using a sequence of tuning function parameters — such as $(\boldsymbol{y}^{\left(i\right)})_{i=1}^{M}=(h_{i},\theta_{i},H_{i},R_{i})_{i=1}^{M}$ in the case of Gaussian neurons (3) — we characterize the population by a measure $f\left(d\boldsymbol{y}\right)$ , where $f\left(Y\right)$ counts the neurons with parameters in a set $Y$ , up to some multiplicative constant. A continuous measure $f$ may be used to approximate a large population. Accordingly, we write $\lambda\left(x;\boldsymbol{y}\right)$ for the tuning function of a neuron with parameters $\boldsymbol{y}$ , in lieu of the previous notation $\lambda^{i}\left(x\right)$ . For example, the Gaussian case (3) takes the form

[TABLE]

with $\boldsymbol{y}=\left(h,\theta,H,R\right)$ the parameters of the Gaussian tuning function.

Similarly, instead of the observation processes $\{N^{i}\}_{i=1}^{M}$ counting the spikes of each neuron, we describe the spikes of all neurons using a single marked point process $N$ (Snyder \BBA Miller, \APACyear1991), which is a random sequence of pairs $\left(t_{k},\boldsymbol{y}_{k}\right)$ , where $t_{k}\in[0,\infty)$ is the time of the $k$ th point and $\boldsymbol{y}_{k}\in\boldsymbol{Y}$ its *mark. *In our case, $t_{k}$ is the $k$ th spike time, and $\boldsymbol{y}_{k}\in\mathbf{Y}$ are the parameters of the spiking neuron. Alternatively, $N$ may be described as a random discrete measure, where $N\left(\left[s,t\right]\times Y\right)$ is the number of spikes in the time interval $\left[s,t\right]$ from neurons with parameters in the set $Y$ . In line with the discrete measure view, we write, for an arbitrary function $h$ ,

[TABLE]

which is the ordinary Lebesgue integral of $h$ with respect to the discrete measure $N$ . We use the notation $N_{t}\left(Y\right)\triangleq N\left([0,t]\times Y\right)$ , and when $Y=\mathbf{Y}$ we omit it and write $N_{t}$ for the total number of spikes up to time $t$ . As before, $\mathcal{N}_{t}$ denotes the history up to time $t$ – here including both spike times and marks.

Figure 4 illustrates how the activity of a neural population may be represented as a marked point process, and how the firing statistics are approximated by a continuous population. In this case, a homogeneous population of neurons with equally-spaced preferred stimuli is approximated by a continuous population with uniformly distributed preferred stimuli.

To characterize the statistics of $N$ , we first consider the finite population case. In this case, $f=\sum_{i}\delta_{\boldsymbol{y}_{i}}$ where $\delta_{\boldsymbol{y}_{i}}$ is the point mass at $\boldsymbol{y}_{i}$ , and the rate of points with marks in a set $Y\subseteq\mathbf{Y}$ at time $t$ (conditioned on $\mathcal{N}_{t},X_{\left[0,t\right]}$ ) is

[TABLE]

In the case of a general population distribution $f$ , we similarly take the integral $\int_{Y}\lambda\left(X_{t};\boldsymbol{y}\right)f\left(d\boldsymbol{y}\right)$ as the rate (or intensity) of points with marks in $Y$ at time $t$ conditioned on $(\mathcal{N}_{t},X_{\left[0,t\right]})$ . The random measure $\lambda\left(X_{t};\boldsymbol{y}\right)f\left(d\boldsymbol{y}\right)$ appearing in this integral is termed the intensity kernel of the marked point process $N$ with respect to the history $(\mathcal{N}_{t},X_{\left[0,t\right]})$ (e.g., (Brémaud, \APACyear1981), Chapter VIII). The dynamics of $N$ may be described heuristically by means of the intensity kernel as

[TABLE]

The intensity kernel with respect to $\mathcal{N}_{t}$ alone is given by

[TABLE]

where

[TABLE]

We denote the rate of the unmarked process $N_{t}$ with respect to $\mathcal{N}_{t}$ (i.e., the total posterior expected firing rate) by

[TABLE]

3.2.3 Filtering

Assume Gaussian tuning functions (13). Writing $\boldsymbol{y}\triangleq\left(h,\theta,H,R\right)$ , the filtering equations (9a)-(9e) take the form

[TABLE]

and the posterior precision updates (10a)-(10b) become

[TABLE]

The derivation of these equations, as well as filtering equations for special cases considered in this section, are found in appendix A.

Using (16g), the continuous update equations (16a)-(16b) may be evaluated in closed form for some specific forms of the population distribution $f$ . Note that the discontinuous update equations (16c)-(16d) do not depend on $f$ , and are already in closed form. We now consider several population distributions where the continuous updates may be brought to closed form.

Single neuron

The result for a single neuron with parameters $h,\theta,H,R$ is trivial to obtain from (16a)-(16b), yielding

[TABLE]

where $\hat{\lambda}_{t}^{f}=\hat{\lambda}_{t}\left(\boldsymbol{y}\right)$ as given by (16g), and $S_{t}^{H,R}$ is defined in (16f).

Uniform population

Here all neurons share the same height $h$ and shape matrices $H,R$ , whereas the location parameter $\theta$ covers $\mathbb{R}^{m}$ uniformly, i.e. $f\left(dh^{\prime},d\theta,dH^{\prime},dR^{\prime}\right)=\delta_{h}\left(dh^{\prime}\right)\delta_{H}\left(dH^{\prime}\right)\delta_{R}\left(dR^{\prime}\right)d\theta$ , where $\delta_{x}$ is a Dirac measure at $x$ , i.e. $\delta_{x}\left(A\right)=1\{x\in A\}$ , and $d\theta$ indicates the Lebesgue measure in the parameter $\theta$ . A straightforward calculation from (16a)-(16b) and (16g) yields

[TABLE]

in agreement with the (exact) result obtained in the uniform coding setting of (Rhodes \BBA Snyder, \APACyear1977), where the filtering equations only include the prior term and the discontinuous update term.

Gaussian population

As in the uniform population case, we assume all neurons share the same height $h$ and shape matrices $H,R$ , and differ only in the location parameter $\theta$ . Abusing notation slightly, we write $f\left(dh^{\prime},d\theta,dH^{\prime},dR^{\prime}\right)=\delta_{h}\left(dh^{\prime}\right)\delta_{H}\left(dH^{\prime}\right)\delta_{R}\left(dR^{\prime}\right)f\left(d\theta\right)$ where the preferred stimuli are normally distributed,

[TABLE]

for fixed $c\in\mathbb{R}^{m}$ , and positive definite $\Sigma_{\mathrm{pop}}$ .

We take $f$ to be normalized, since any scaling of $f$ may be included in the coefficient $h$ in (13), resulting in the same point process. Thus, when used to approximate a large population, the coefficient $h$ would be proportional to the number of neurons.

The continuous updates for this case read

[TABLE]

These updates generalize the single-neuron updates (17), with the population center $c$ taking the place of the location parameter $\theta$ , and $Z_{t}^{H,R}$ substituting $S_{t}^{H,R}$ . The single-neuron case is obtained when $\Sigma_{\mathrm{pop}}=0$ .

It is illustrative to consider these equations in the scalar case $m=n=1$ , with $H=1$ . Letting $\sigma_{t}^{2}=\Sigma_{t},\alpha^{2}=R^{-1},\sigma_{\mathrm{pop}}^{2}=\Sigma_{\mathrm{pop}}$ yields

[TABLE]

Figure 5a demonstrates the continuous update terms (21) as a function of the current mean estimate $\mu_{t}$ , for various values of the population variance $\sigma_{\mathrm{pop}}^{2}$ , including the case of a single neuron, $\sigma_{\mathrm{pop}}^{2}=0$ . The continuous update term $d\mu_{t}^{\mathrm{c}}$ pushes the posterior mean $\mu_{t}$ away from the population center $c$ in the absence of spikes. This effect weakens as $\left|\mu_{t}-c\right|$ grows due to the factor $\hat{\lambda}_{t}^{f}$ , consistent with the idea that far from $c$ , the lack of events is less surprising, hence less informative. The continuous variance update term $d\sigma_{t}^{2,\mathrm{c}}$ increases the variance when $\mu_{t}$ is near $\theta$ , otherwise decreases it. This stands in contrast with the Kalman-Bucy filter, where the posterior variance cannot increase when estimating a static state.

Uniform population on an interval

In this case we assume a scalar state, $n=m=1$ , and

[TABLE]

where similarly to the Gaussian population case, $h$ and $R$ are fixed. Unlike the Gaussian case, here we find it more convenient not to normalize the distribution. Since the state is assumed to be scalar, let $\sigma_{t}^{2}=\Sigma_{t},\alpha^{2}=R^{-1}$ . The continuous updates for this case are

[TABLE]

Figure 5b demonstrates the continuous update terms (23) as a function of the current mean estimate $\mu_{t}$ . When the mean estimate is around an endpoint of the interval, the mean update $\mu_{t}^{\mathrm{c}}$ pushes the posterior mean outside the interval in the absence of spikes. The posterior variance $\sigma_{t}^{2}$ decreases outside the interval, where the absence of spikes is expected, and increases inside the interval, where it is unexpected777This holds only approximately, when the tuning width is not too large relative to the size of the interval. For wider tuning functions the behavior becomes similar to the single sensor case.. When the posterior mean is not near the interval endpoints, the updates are near zero, consistently with the uniform population case (18).

Finite mixtures

Note that the continuous updates (16a)-(16b) are linear in $f$ . Accordingly, if $f\left(dy\right)=\sum_{i}\alpha_{i}f_{i}\left(dy\right),$ where each $f_{i}$ is of one of the above forms, the updates are obtained by the appropriate weighted sums of the filters derived above for the various special forms of $f_{i}$ . This form is quite general: it includes populations where $\theta$ is distributed according to a Gaussian mixture, as well as heterogeneous populations with finitely many different values of the shape matrices $H,R$ . The resulting filter includes a term for each component of the mixture.

4 Numerical evaluation

Since the filter (16) is based on an assumed density approximation, its results may be inexact. We tested the accuracy of the filter in the Gaussian population case (20), by numerical comparison with Particle Filtering (PF) (Doucet \BBA Johansen, \APACyear2009).

Figure 6 shows two examples of filtering a one-dimensional process observed through a Gaussian population (19) of Gaussian neurons (3), using both the ADF approximation (20) and a Particle Filter (PF) for comparison. See the figure caption for precise details. Figure 7 shows the distribution of approximation errors and the deviation of the posterior from Gaussian. The approximation errors plotted are the relative error in the mean estimate $\epsilon_{\mu}\triangleq\left(\mu_{\mathrm{ADF}}-\mu_{\mathrm{PF}}\right)/\sigma_{\mathrm{PF}}$ , and the error in the posterior standard deviation estimate $\epsilon_{\sigma}\triangleq\left(\sigma_{\mathrm{ADF}}-\sigma_{\mathrm{PF}}\right)/\sigma_{\mathrm{PF}}$ , where $\mu_{\mathrm{ADF}},\mu_{\mathrm{PF}},\sigma_{\mathrm{ADF}},\sigma_{\mathrm{PF}}$ are, respectively, the posterior mean obtained from ADF and PF, and the posterior standard deviation obtained from ADF and PF. The deviation of the posterior distribution from Gaussian is quantified using the Kolmogorov-Smirnov (KS) statistic $\sup_{x}\left|F\left(x\right)-G\left(x\right)\right|$ where $F$ is the particle distribution cdf and $G$ is the cdf of a Gaussian matching $F$ ’s first two moments. For comparison, the orange lines in Figure 7 show the distribution of this KS statistic under the hypothesis that the particles are drawn independently from a Gaussian, which is known as the Lilliefors distribution (see (Lilliefors, \APACyear1967)). As seen in the figure, the Gaussian posterior assumption underlying the ADF approximation is quite accurate despite the fact that the population is non-uniform. Accordingly, approximation errors are typically of a few percent (see Table 1b).

Figure 8 shows an example of filtering a two-dimensional process with dynamics

[TABLE]

which may be interpreted as the position and velocity of a particle subject to friction proportional to its velocity as well as “Gaussian white noise” external force. In this example, only the position is directly observed by the neural population. Additional details are given in the figure caption. The distribution of approximation errors and the KS statistic in this two-dimensional setting is shown in Figure 9. The approximation errors plotted are $\epsilon_{\mu}$ and $\epsilon_{\sigma}$ as defined above; both these errors and the KS statistic are computed separately for each dimension.

Statistics of the estimation error distribution for these examples are provided in Table 1b.

5 Encoding

We demonstrate the use of the Assumed Density Filter in determining optimal encoding strategies, i.e., selecting the optimal population parameters $\phi$ (see Section 2). To illustrate the use of ADF for the encoding problem, we consider two simple examples. We also use the first example as a test for the filter’s robustness. We will study optimal encoding issues in more detail in a subsequent paper.

5.1 Optimal encoding depends on prior variance

Previous work using a finite neuron population and a Fisher information-based criterion (Harper \BBA McAlpine, \APACyear2004) has suggested that the optimal distribution of preferred stimuli depends on the prior variance. When it is small relative to the tuning width, optimal encoding is achieved by placing all preferred stimuli at a fixed distance from the prior mean. On the other hand, when the prior variance is large relative to the tuning width, optimal encoding is uniform (see figure 2 in (Harper \BBA McAlpine, \APACyear2004)). These results are consistent with biological observations reported in (Brand \BOthers., \APACyear2002) concerning the encoding of aural stimuli.

Similar results are obtained with our model, as shown in Figure 10. Whereas (Harper \BBA McAlpine, \APACyear2004) implicitly assumed a static state in the computation of Fisher information, we use a time-varying scalar state. The state obeys the dynamics

[TABLE]

and is observed through a Gaussian population (19) and filtered using the ADF approximation. In this case, optimal encoding is interpreted as the simultaneous optimization of the population center $c$ and the population variance $\sigma_{\mathrm{pop}}^{2}$ . The process is initialized so that it has a constant prior distribution, its variance given by $s^{2}/\left(2\left|a\right|\right)$ . In Figure 10 (left), the steady-state prior distribution is narrow relative to the tuning width, leading to an optimal population with a narrow population distribution far from the origin. In Figure 10 (right), the prior is wide relative to the tuning width, leading to an optimal population with variance that roughly matches the prior variance.

Note that we are optimizing only the two parameters $c,\sigma_{\mathrm{pop}}^{2}$ rather than each preferred stimulus as in (Harper \BBA McAlpine, \APACyear2004). This mitigates the considerable additional computational cost due to simulating the decoding process, rather than computing Fisher information. This direct simulation, though more computationally expensive, offers two advantages over the Fisher information-based method which is used in (Harper \BBA McAlpine, \APACyear2004) and which is prevalent in computational neuroscience. First, the simple computation of Fisher information from tuning functions, commonly used in the neuroscience literature, is based on the assumption of a static state, whereas our method can be applied in a fully dynamic context, including the presence of observation-dependent feedback. Second, simulations of the decoding process allow for the minimization of arbitrary criteria, including the direct minimization of posterior variance or Mean Square Error (MSE). As discussed in the introduction, Fisher information may be a poor proxy for the MSE for finite decoding times.

We also test the robustness of the filter to inaccuracies in model parameters or observation/encoding parameters in this problem. We use inaccurate values for the model parameter $a$ in (25) and the observation/encoding parameter $h$ in (13) in the filter. Specifically, we multiply or divide each of the two parameters by a factor of $5/4$ in the filter, while the dynamics and the observing neural population remain unaltered. The results remain qualitatively similar, as seen in Figure (11).

5.2 Adaptation of homogeneous population to stimulus statistics

In (Benucci \BOthers., \APACyear2013), the tuning of cat primary visual cortex (V1) neurons to oriented gratings was measured after adapting either to a uniform distribution of orientations, or to a “biased” distribution with one orientation being more common. The population’s preferred stimuli are roughly uniformly distributed, and adapt to the prior statistics through change in amplitude. The adapted population was observed to have decreased firing rate near the common orientation, so that the mean firing rate is constant across the population.

We present a simplified model where optimal encoding exhibits a constant mean firing rate across a neural population. A random state $X$ is drawn from a “biased” prior distribution which is a mixture of two uniform distributions on intervals sharing an endpoint (see Figure 12a),

[TABLE]

The neural population consists of Gaussian neurons (13) with preferred locations uniformly distribution on the interval $\left[a_{1},b_{2}\right]$ . However, they are allowed to adapt to different tuning amplitude $h_{1},h_{2}$ in each of the sub-intervals $\left[a_{1},b_{1}\right],\left[a_{2},b_{2}\right]$ respectively. Thus, the population distribution is given by a mixture of two uniform components,

[TABLE]

where $H=1$ . We optimize the parameters $h_{1},h_{2}$ to minimize accumulated decoding MSE over a finite decoding interval under a constraint on the total firing rate of the population,

[TABLE]

where $\hat{X}_{t}=\mathbf{E}\left[X|\mathcal{N}_{t}\right]$ is the MMSE estimate of $X$ . The total firing rate $r$ may be obtained from (13), (26), (27), yielding

[TABLE]

where $r_{i}$ is the total firing rate in the $i$ th sub-population, $\alpha^{2}=R$ , $\Phi_{1}\left(x\right)\triangleq\int_{-\infty}^{x}\Phi=x\Phi\left(x\right)+\phi\left(x\right),$ and $\phi,\Phi$ are respectively the pdf and cdf of the standard Gaussian distribution. In this continuous population approximation, the firing rate of a *single *neuron with preferred stimulus $\theta$ is proportional to $\mathbf{E}\left[\lambda\left(X;\theta\right)\right]$ . We use narrow tuning ( $\alpha=0.1$ ), so that the rate $\mathbf{E}\left[\lambda\left(X;\theta\right)\right]$ is nearly constant within each sub-population, justifying the approximation

[TABLE]

To solve this optimization problem, we approximate $\hat{X}_{T}$ by the ADF filter output $\mu_{T}$ . We further assume that the MSE is a decreasing function of the total firing rate $r$ , so that the solution is obtained for $r=\bar{r}$ . Although we have no proof of this claim, it appears reasonable, since the posterior variance decreases at each spike (see 3.1.5). The stronger constraint $r=\bar{r}$ leaves a single degree of freedom, the amplitude ratio $h_{1}/h_{2}$ . In Figure 12b we evaluate $\mathbf{E}\left[\int\left(X-\mu_{t}\right)^{2}dt\right]$ using Monte Carlo simulation for various values of the ratio $h_{1}/h_{2}$ , as well as location of the interval endpoint $b_{2}$ . Although the optimization is constrained only by the total firing rate, the minimal MSE is obtained near the solution of

[TABLE]

where firing rates are equalized across the population.

6 Comparison to Previous Work

6.1 Neural Decoding

Table 6.1 provides a concise comparison of our setting and results to previous works on optimal neural decoding. As noted in the introduction, we focus our comparison on analytically expressible continuous-time filters.

{sidewaystable}

Summary of the setting and results of several previous works based on continuous-time filtering of point process observations, in comparison to the current work. The *complexity *column lists the asymptotic computational complexity of each time step in a discrete-time implementation of the filter, as a function of population size $n$ and number of spikes $N_{t}$ . A complexity of $\infty$ denotes an infinite-dimensional filter.

Ref. Dynamics Neural code

Decoding

rates population

exact

complexity

(Snyder, \APACyear1972)a

& (Segall \BOthers., \APACyear1975) diffusion any finiteb

✓

$\infty$

(Snyder, \APACyear1972)a diffusion any finiteb

$n$

(Rhodes \BBA Snyder, \APACyear1977) lin. G. diff. G. uniform

✓

$1$

(Komaee, \APACyear2010) lin. G. diff. any uniform

$1$

(Huys \BOthers., \APACyear2007) 1d G. process G. uniform

✓

$N_{t}^{3}$ c

(Eden \BBA Brown, \APACyear2008) lin. G. diff. any finite

$n$

(Susemihl \BOthers., \APACyear2011, \APACyear2013) 1d G. Mat. G. uniform

$1$

(Bobrowski \BOthers., \APACyear2009)

& (Twum-Danso \BBA Brockett, \APACyear2001) CTMC any finite

✓

$n$

This work diffusion G. non-uniform

$1$ d

a

Snyder (\APACyear1972) includes both an exact PDE for the posterior statistics, and an approximate solution.

b

The setting of Snyder (\APACyear1972) and Segall \BOthers. (\APACyear1975) is a single observation point process, but the result is readily extended to a finite population by summation.

c

This filter is non-recursive, and its complexity grows with the history of spikes. The exponent 3 is related to inversion of an $N_{t}\times N_{t}$ matrix, which in principle can be performed with lower complexity.

d

Constant complexity for distributions over preferred stimuli described in section 3.2.3 (including the uniform coding case). For heterogenous mixture populations, the complexity is linear in the number of mixture components.

abbreviations: G. Gaussian, lin. G. diff. linear Gaussian diffusion, 1d G. Mat. 1-dimensional Gaussian process with Matern kernel auto-correlation, CTMC Continuous-time Markov chain.

Few previous analytically oriented works studied neural decoding for dynamic stimuli, or without the uniform coding assumption. The filtering problem for a dynamic state with a general (non-uniform) finite population is tractable when the the state space is finite; in this case the posterior is finite-dimensional and obeys SDEs derived in (Brémaud, \APACyear1981). These results have been presented in a neuroscience context in (Twum-Danso \BBA Brockett, \APACyear2001) and (Bobrowski \BOthers., \APACyear2009).

We are aware of a single previous work (Eden \BBA Brown, \APACyear2008) deriving a closed-form filter for *non-uniform *coding of a diffusion process in continuous time. The setting of (Eden \BBA Brown, \APACyear2008) is a finite populations with arbitrary tuning functions, and the derivation uses an approximation similar to the one used in the current work. Our work differs from (Eden \BBA Brown, \APACyear2008) most notably by the use of a parameterized “infinite” population. In this sense, the setting of (Eden \BBA Brown, \APACyear2008) is less general, corresponding to the case of finite population described in section 3.1. On the other hand, (Eden \BBA Brown, \APACyear2008) deals with a more general form for the tuning functions. Although (Eden \BBA Brown, \APACyear2008) also approximates the posterior using a Gaussian distribution, it is not equivalent to our filter. The two filters are compared in detail in appendix D.

6.2 Neural Encoding

As the current work is concerned with efficient neural decoding as a tool for studying neural encoding in non-uniform populations, we briefly mention some works that study neural encoding analytically. As noted above, encoding will be studied in more detail in a subsequent paper. Many previous works used Fisher information to study non-uniform coding of static stimuli. As mentioned in the introduction, Fisher information does not necessarily provide a good estimate for possible decoding performance, and optimizing it may yield qualitatively misleading results (see (Bethge \BOthers., \APACyear2002; Yaeli \BBA Meir, \APACyear2010; Pilarski \BBA Pokora, \APACyear2015)). However, we note one such work, (Ganguli \BBA Simoncelli, \APACyear2014), which is particularly relevant in comparison to the current work, as it similarly studies large parameterized populations. The neural populations studied in (Ganguli \BBA Simoncelli, \APACyear2014) are characterized by two functions of the stimulus: a “density” function, which described the local density of neurons; and a “gain” function, which modulates each neuron’s gain according to its preferred location. The density function also distorts tuning functions so that tuning width is approximately inversely proportional to neuron density, resulting in approximately uniform coding when the gain function is constant. On the other hand, the introduction of the gain function produces a violation of uniform coding. This population is optimized for Fisher information and several related measures. For each of these measures, the optimal population density is shown to be monotonic with the prior, i.e., more neurons should be assigned to more probable states. This is in contrast to the results we present in section (5) (e.g. Figure 10, left), where the optimal population distribution is shifted relative to the prior distribution when the prior is narrow. This discrepancy may be attributed to the limitations of Fisher information in predicting actual decoding error, or to the coupling of neuron density and tuning width used in (Ganguli \BBA Simoncelli, \APACyear2014) to facilitate the derivation of closed-form solutions.

Several previous works attempted direct minimization of decoding MSE rather than Fisher information. In (Yaeli \BBA Meir, \APACyear2010), an explicit expression for the MSE is derived in the static case with uniform coding, and is used to characterize optimal tuning function width and its relation to coding time. More recently, (Wang \BOthers., \APACyear2016) studied $L_{p}$ -based loss measures and presented exact results for optimal tuning functions in the case of univariate static signals, for single neurons and homogeneous populations. In (Susemihl \BOthers., \APACyear2011, \APACyear2013), a mean-field approximation is suggested to allow efficient evaluation of the MSE in a dynamics setting.

Acknowledgements

The work of YH and RM is partially supported by grant No. 451/17 from the Israel Science Foundation, and by the Ollendorff center of the Viterbi Faculty of Electrical Engineering at the Technion.

Appendix A Derivation of filtering equations

A.1 Setting and notation

In the main text, we have presented our model in an open-loop setting, where the process $X$ is passively observed. Here we consider a more general setting, incorporating a control process $U_{t}$ , so the dynamics are

[TABLE]

where, in general, $U_{t}$ is a function of $\mathcal{N}_{t}$ .

We denote by $p_{t}^{\,\mathcal{N}}\left(\cdot\right)$ the posterior density – that is, the density of $X_{t}$ given $\mathcal{N}_{t}$ , and by $\mathbf{E}_{t}^{\mathcal{N}}\left[\cdot\right]$ the posterior expectation – that is, expectation conditioned on $\mathcal{N}_{t}$ .

A.2 The Innovation Measure

We derive the filtering equations in terms of the innovation measure of the marked point process, which is a random measure closely related to the notion of *innovation process *associated with unmarked point processes or diffusion processes. The role of the innovation measure in filtering marked point processes is analogous to the role of the innovation process in Kalman filtering.

In section (3.1.1) we characterized the intensity (or rate) of each point process in a finite population using the asymptotic behavior of jump probabilities in small intervals. An alternative definition is the following888A more detailed discussion of this definition and of innovation processes is available in (Segall \BBA Kailath, \APACyear1975): Consider an unmarked point process $N_{t}$ with $\mathcal{N}_{t}$ denoting its history up to time $t$ . Given some history $\mathcal{F}_{t}$ containing $\mathcal{N}_{t}$ (e.g. $\mathcal{F}_{t}$ might include the observation process $N$ and its driving state process $X$ ), the process $\lambda_{t}^{\mathcal{F}}$ is called the intensity of $N$ relative to $\mathcal{F}$ when $\lambda_{t}$ is $\mathcal{F}_{t}$ -measurable999i.e., it is strictly a function of the history $\mathcal{F}_{t}$ . and

[TABLE]

is an $\mathcal{F}_{t}$ -martingale, meaning

[TABLE]

or equivalently,

[TABLE]

Heuristically we may write this relation as

[TABLE]

When $\mathcal{F}_{t}=\mathcal{N}_{t}$ , the process $I_{t}\triangleq I_{t}^{\mathcal{N}}$ is called the innovation process. We may write

[TABLE]

so the innovation increments $dI_{t}$ represent the “unexpected” part of the increments $dN_{t}$ after observation of $\mathcal{N}_{t}$ . The innovation process may be similarly defined for other stochastic processes (such as discrete-time or diffusion processes), and it plays an important role in the Kalman and the Kalman-Bucy filters. It plays an analogous role in the filtering of point processes, as seen below.

As discussed in section (3.2.2), in the continuous population model we characterize the observation process $N$ by its intensity kernel relative to the history $\left(\mathcal{N}_{t},X_{\left[0,t\right]}\right)$ ,

[TABLE]

This equation is heuristic notation for the statement that the rate of $N_{t}\left(Y\right)$ relative to $\left(\mathcal{N}_{t},X_{\left[0,t\right]}\right)$ is $\int_{Y}\lambda\left(X_{t};\boldsymbol{y}\right)f\left(d\boldsymbol{y}\right)$ , for any measurable $Y\subset\boldsymbol{Y}$ . The rate relative to the spiking history $\mathcal{N}_{t}$ is obtained by marginalizing over $X_{t}$ (see (Segall \BBA Kailath, \APACyear1975), Theorem 2), yielding the rate $\int_{\boldsymbol{Y}}\hat{\lambda}\left(\boldsymbol{y}\right)f\left(d\boldsymbol{y}\right)$ where

[TABLE]

Therefore the innovation process of $N_{t}\left(Y\right)$ is $I_{t}\left(Y\right)=N_{t}\left(Y\right)-\int_{0}^{t}\int_{Y}\hat{\lambda}_{s}\left(y\right)f\left(d\boldsymbol{y}\right)ds$ , and accordingly we define the innovation measure $I$ to be the random measure

[TABLE]

so that the innovation process may be expressed as

[TABLE]

The martingale property of $I_{t}\left(Y\right)$ implies that the innovation measure satisfies

[TABLE]

for all measurable $Y\subset\boldsymbol{Y}$ .

A.3 Exact filtering equations

Define

[TABLE]

The stochastic PDE (5) is extended in (Rhodes \BBA Snyder, \APACyear1977) for the case of marked point process observation in the presence of feedback101010The setting considered in (Rhodes \BBA Snyder, \APACyear1977) assumes linear dynamics and an inifinite uniform population. However, these assumption are only relevant to establish other proposition in that paper. The proof of equation (5) still holds as is in our more general setting.: in this case, the posterior density $p_{t}^{\,\mathcal{N}}$ obeys the equation

[TABLE]

Here $\mathcal{L}_{t}$ is the posterior infinitesimal generator, defined with an additional conditioning on $\mathcal{N}_{t}$ ,

[TABLE]

and $\mathcal{L}_{t}^{*}$ is its adjoint. Note that in this closed-loop setting, the infinitesimal generator is itself a random operator, due to its dependence on past observations through the control law, and that $N_{t}$ is no longer a doubly-stochastic Poisson process.

As in section (3.1.3), we use the notations

[TABLE]

We derive of the following equations for the first two posterior moments, which generalize (6) to marked point processes, and for the presence of feedback in the state dynamics,

[TABLE]

A rigorous derivation of (34a) under more general conditions is found in (Segall \BOthers., \APACyear1975), from which (34b) may be derived by considering the dynamics of the process $X_{t}X_{t}^{\intercal}$ . Here we provide a more heuristic derivation based on (33).

Compare the mean update equation (34a) to the Kalman-Bucy filter, which gives the posterior moments for a diffusion process $X$ with noisy observations $Y$ of the form

[TABLE]

where $W,V$ are independent standard Wiener processes. The Kalman-Bucy filter reads

[TABLE]

Here, the term $dY_{t}-H_{t}\mu_{t}dt$ appearing in the first equation is an increment of the innovation process $Y_{t}-\int_{0}^{t}H_{s}\mu_{s}ds$ .

For a sufficiently well-behaved function $h$ , we find, using (33) and the definition of operator adjoint,

[TABLE]

Assuming the state evolves as in (31), the (closed loop) infinitesimal generator is

[TABLE]

which, when specialized to the functions $h_{i}\left(x\right)=x^{i}$ and $h_{ij}\left(x\right)=x^{i}x^{j}$ , where $x^{i}$ is the $i$ th component of $x$ , reads

[TABLE]

Substituting (36) into (35) yields

[TABLE]

or in vector notation

[TABLE]

To compute $d\Sigma_{t}$ we use the representation $d\Sigma_{t}=d\mathbf{E}_{t}^{\mathcal{N}}\left[X_{t}X_{t}^{\intercal}\right]-d\left(\mu_{t}\mu_{t}^{\intercal}\right)$ . The first term is computed by substituting (37) into (35), yielding

[TABLE]

or in matrix notation, after some rearranging,

[TABLE]

To calculate $d\left(\mu_{t}\mu_{t}^{\intercal}\right)$ from (38) we separately handle the continuous terms, and the jump term involving $N\left(dt,d\boldsymbol{y}\right)$ . The continuous terms are the continuous part of $d\mu_{t}\mu_{t}^{\intercal}+\mu_{t}d\mu_{t}^{\intercal}$ . To compute the jump terms, we note that when $\mu_{t}$ jumps by $\Delta_{t}$ , the corresponding jump in $\mu_{t}\mu_{t}^{\intercal}$ is $\Delta_{t}\mu_{t}^{\intercal}+\mu_{t}\Delta_{t}^{\intercal}+\Delta_{t}\Delta_{t}^{\intercal}$ , therefore

[TABLE]

Subtracting (41) from (39), and noting that $\mathbf{E}_{t}^{\mathcal{N}}\omega_{t}^{\boldsymbol{y}}=0$ , so that

[TABLE]

yields (34b).

Writing (34a)-(34b) according to the decomposition described in section 3.1.3,

[TABLE]

A.4 ADF approximation for Gaussian tuning

We now proceed to apply the Gaussian ADF approximation $p_{t}^{\,\mathcal{N}}\left(x\right)\approx\mathcal{N}\left(x;\mu_{t},\Sigma_{t}\right)$ to (42)-(45) in the case of Gaussian neurons (3), deriving approximate filtering equations written in terms of the population density $f\left(d\boldsymbol{y}\right)$ . From here on we use $\mu_{t},\Sigma_{t}$ , and $p_{t}^{\,\mathcal{N}}$ to refer to the ADF approximation rather than to the exact values.

We use the following algebraic results. The first is a slightly generalized form of a well-known result about the sum of quadratic forms, which is useful for multiplying Gaussians with possibly degenerate precision matrices.

*Claim 1**.*

Let $x,a,b\in\mathbb{R}^{n}$ and $A,B\in\mathbb{R}^{n\times n}$ be symmetric matrices such that $A+B$ is non-singular. Then

[TABLE]

Proof.

By straightforward expansion of each side. ∎

Next we note two matrix inversion lemmas, the first of which is known as the Woodbury identity. These are useful for transferring variance matrices between state and perceptual coordinates in our model. Derivations may be found in (Henderson \BBA Searle, \APACyear1981).

*Claim 2**.*

For $U,V^{\intercal}\in\mathbb{R}^{m\times n}$ and non-singular $A\in\mathbb{R}^{m\times m},C\in\mathbb{R}^{n\times n}$ , the following equalities hold

[TABLE]

whenever all the relevant inverses exist.

To evaluate the posterior of expectations in (42)-(45) we first simplify the expression

[TABLE]

Using the Gaussian ADF approximation $p_{t}^{\,\mathcal{N}}\left(x\right)=\mathcal{N}\left(x;\mu_{t},\Sigma_{t}\right)$ , equation (13), and Claim 1, we find

[TABLE]

where $H_{r}^{-1}$ is any right inverse of $H$ , and

[TABLE]

To simplify the notation, we suppress the dependence of these and other quantities on $H$ and $R$ throughout this section. Claim 2 establishes the relation $Q_{t}=H^{\intercal}S_{t}H$ , where

[TABLE]

yielding

[TABLE]

and by normalizing this Gaussian (see (47)) we find that

[TABLE]

yielding

[TABLE]

Using Claim 2, the difference $\mu_{t}^{\theta}-\mu_{t}$ may be rewritten as

[TABLE]

where $\delta_{t}=H\mu_{t}-\theta$ , and an application of the Woodbury identity (46) yields

[TABLE]

Now,

[TABLE]

Plugging this result into (42)-(45) yields (16a)-(16d). Integrating (50) over $x$ yields

[TABLE]

Sylvester’s determinant identity yields the equality $\left|I+\Sigma_{t}H^{T}RH\right|=\left|R\right|/\left|S_{t}^{R}\right|$ , from which (16g) follows.

The continuous precision update (16h) follows directly from (16b) and the relation

[TABLE]

which holds whenever $\Sigma_{t}$ is differentiable – i.e., between spikes. To derive (16i), consider a spike at time $t$ with mark $\boldsymbol{y}=\left(h,\theta,H,R\right)$ . According to (16d),

[TABLE]

Finally, equation (44) and (51) yields $d\mu_{t}^{N}=\int\left(\mu_{t^{-}}^{\theta}-\mu_{t^{-}}\right)N\left(dt,d\boldsymbol{y}\right)$ , so at spike times

[TABLE]

The finite-population case is (12).

A.5 Approximation of continuous terms for specific population distributions

A.5.1 Gaussian population

When the preferred stimuli are normally distributed (19), the continuous update terms (16a)-(16b) may be computed analogously to the derivation in section (A.4) above. First, starting from (16g), an analogous computation to the derivation of (50) and (16g) above yields

[TABLE]

where

[TABLE]

Integrating this equation over $\theta$ and applying Sylvester’s determinant lemma as in the derivation of (16g) yields (20d). The matrix $Z_{t}^{H,R}$ (eq. (20c)) and vector $\mu_{t}^{f}$ play an analogous role to that of $S_{t}$ and $\mu_{t}^{\theta}$ respectively in section (A.4). Substituting into (16a)-(16b) and simplifying analogously yields (20).

A.5.2 Uniform population on an interval

In this case $R,h$ are constant, $m=n=1$ and $H=1$ , and the preferred stimulus distribution is $f\left(d\theta\right)=1_{\left[a,b\right]}\left(\theta\right)d\theta$ , so (16a)-(16b) take the form

[TABLE]

where $\sigma_{t}^{2}=\Sigma_{t},\sigma_{\mathrm{r}}^{2}=R^{-1}$ , and we suppressed the dependence of $\hat{\lambda}$ on $h,R$ from the notation, since $h,R$ are fixed. The integrals can be computed from the following identities

[TABLE]

where

[TABLE]

Writing

[TABLE]

we find that

[TABLE]

Substitution into (52)-(53) yields (23).

Appendix B Implementation Details

B.1 State dynamics

All simulations in this paper use linear dynamics of the form

[TABLE]

which are implemented via a straightforward Euler scheme. Specifically, for step-size $\Delta t$ , we approximate $X_{k\Delta t}$ by $x_{k}$ where

[TABLE]

and $\xi_{k}$ is a sequence of independent standard normal variables (independent of $X_{0}$ ).

B.2 Continuous neural population

The simulation of marked point processes, used to model continuous neural populations (see Section 3.2.2), involves the generation of random times and corresponding random marks. In the case of a finite population, there is a finite number of marks, and the point process corresponding to each mark may be simulated separately. For an infinite population, a different approach is required.

Given the intensity kernel $\lambda\left(X_{t};\boldsymbol{y}\right)f\left(d\boldsymbol{y}\right),$ we simulate a marked point process in two stages: first generating the random times (spike times), and then the random marks (neuron parameters). To generate the random times $\left(t_{1},\ldots t_{N_{T}}\right),$ note that the total history-conditioned firing rate at time $t$ is given by

[TABLE]

and the unmarked process $N_{t}$ is a doubly-stochastic Poisson process with random rate $r\left(X_{t}\right)$ . Conditioned on $X$ and the point process history, each random mark $\boldsymbol{y}_{i}$ is distributed

[TABLE]

(see (Brémaud, \APACyear1981, Theorem T6)).

Accordingly, the simulation of the marked process $N$ proceeds as follows:

Using the generated trajectory $x_{k}\approx X_{k\Delta t}$ , simulate a Poisson process with rate $r\left(X_{t}\right)$ , yielding the random times $(t_{1},\ldots t_{N_{T}})$ . This may be accomplished either via direct generation of a point for each time step with probability $r\left(x_{k}\right)\Delta t$ , or more efficiently via time rescaling (see, e.g. (Brown \BOthers., \APACyear2002)). 2. 2.

Generate random marks $(\boldsymbol{y}_{1},\ldots\boldsymbol{y}_{N_{T}})$ by sampling independently from the distribution $\kappa\left(X_{t_{i}},d\boldsymbol{y}\right)$ .

As in section 3.2.3, when $h,H,R$ are fixed across the population, we abuse notation and write $f\left(dh^{\prime},d\theta,dH^{\prime},dR^{\prime}\right)=\delta_{h}\left(dh^{\prime}\right)\delta_{H}\left(dH^{\prime}\right)\delta_{R}\left(dR^{\prime}\right)f\left(d\theta\right)$ , and similarly for $\kappa\left(x;d\theta\right)$ . The functions $r\left(x\right)$ and the distribution $\kappa(x,d\theta)$ for each of the distributions of preferred stimuli in section 3.2.3 are given in closed form in Table 2 . For a finite heterogeneous mixture population, $r$ and $\kappa$ may be obtained through the appropriate weighted summation; however, it is easier to simulate each component separately.

B.3 Filter

Similarly to the state dynamics, we approximate the filter equations

[TABLE]

using a Euler approximation

[TABLE]

For the linear dynamics (54) used in simulations throughout this paper, the prior terms $d\mu_{t}^{\pi},d\Sigma_{t}^{\pi}$ are given by (8). The continuous update terms $d\mu_{t}^{\mathrm{c}},d\Sigma_{t}^{\mathrm{c}}$ depend on the population as described in section 3.2.3. The discontinuous updates $d\mu_{t}^{N},d\Sigma_{t}^{N}$ are given by (16c)-(16d), and are non-zero only at time-steps containing a spike.

Appendix C Variance as proxy for MSE

In section (5), we studied optimal encoding using the posterior variance as a proxy for the MSE. Letting $\mu_{t},\Sigma_{t}$ denote the approximate posterior moments given by the filter, the MSE and posterior variance are related as follows,

[TABLE]

where $\mathbf{E}_{t}^{\mathcal{N}}\left[\cdot\right],\mathrm{Var}_{t}^{\mathcal{N}}\left[\cdot\right]$ are resp. the mean and covariance matrix conditioned on $\mathcal{N}_{t}$ , and $\mathrm{tr}$ is the trace operator. Thus for an exact filter, having $\mu_{t}=\mathbf{E}_{t}^{\mathcal{N}}X_{t},\Sigma_{t}=\mathrm{Var}_{t}^{\mathcal{N}}X_{t}$ , we would have $\mathrm{MSE}_{t}=\mathbf{E}[\mathrm{tr}\Sigma_{t}]$ . Conversely, if we find that $\mathrm{MSE}_{t}\approx\mathbf{E}[\mathrm{tr}\Sigma_{t}]$ , it suggests that the errors are small (though this is not guaranteed, since the errors in $\mu_{t}$ and $\Sigma_{t}$ may effect the MSE in opposite directions, if the variance is underestimated).

Figure 13 shows the variance and MSE in estimating the same process as in Figure 10, after averaging across 10000 trials. The results indicate that the filter has good accuracy with these parameters, so that the variance is a reasonable approximation for the MSE.

Figure (14) is a variant of Figure (10) showing the MSE rather than the variance. The results are noisier but qualitatively similar. The largest differences are observed in Figure 14b for small population variance, where the ADF estimation is poor due to very few spikes occurring.

Appendix D Comparison to previous works – additional details

As noted in section 6.1, we are aware of a single previous work (Eden \BBA Brown, \APACyear2008) deriving a closed-form filter for *non-uniform *coding of a diffusion process in continuous time. We now present a detailed comparison of our work to (Eden \BBA Brown, \APACyear2008).

The filter derived in (Eden \BBA Brown, \APACyear2008) is a continuous-time version of a discrete-time filter presented in (Eden \BOthers., \APACyear2004). The setting of (Eden \BOthers., \APACyear2004) involves linear discrete-time state dynamics, and a finite population of neurons with arbitrary tuning functions, firing independently (given the state). The derivation of the filter relies on an approximation applied to the measurement update for the posterior density,

[TABLE]

where $X_{k}$ is the external state at time $t_{k}$ , $\mathcal{N}_{k}$ is the spiking history up to time $t_{k}$ , and $\Delta\mathcal{N}_{k}$ the spike counts in the interval $(t_{k-1},t_{k}]$ . A Gaussian density is substituted for each of the terms $p\left(X_{k+1},\mathcal{N}_{k+1}\right),p\left(X_{k+1},\mathcal{N}_{k}\right)$ , and each side is expanded to a second-order Taylor series about the point $\mathbf{E}\left[X_{k}|\mathcal{N}_{k-1}\right]$ . This yields equations relating the first two moments of $p\left(X_{k}|\mathcal{N}_{k}\right)$ to those of $p\left(X_{k}|\mathcal{N}_{k-1}\right)$ . The time update equations for the first two moments (relating $p\left(X_{k+1}|\mathcal{N}_{k}\right)$ to $p\left(X_{k}|\mathcal{N}_{k}\right)$ ) require no approximation since the dynamics are linear. The resulting discrete-time filtering equations (equations (2.7)-(2.10) in (Eden \BOthers., \APACyear2004)) depend on the gradient and Hessian of the logarithm of the tuning functions at the point $\mathbf{E}\left[X_{k}|\mathcal{N}_{k-1}\right]$ . The continuous-time version (equations (6.3)-(6.4) in (Eden \BBA Brown, \APACyear2008)) is derived by taking the limit as the time discretization step $\Delta t$ approaches 0.

In contrast, the starting point in our derivation is the exact update equations for the first two moments expressed in terms of posterior expectations. A Gaussian posterior is substituted into these expectations, resulting in tractable integrals in the case of Gaussian tuning functions. The tractability of these integrals depends on the Gaussian form of the tuning functions.

To compare the resulting filters, we consider a finite population of Gaussian neurons: this is the intersection of the setting of the current work with that of (Eden \BBA Brown, \APACyear2008). In this case, the filtering equations in (Eden \BBA Brown, \APACyear2008) yield the same discontinuous update terms, but the continuous update terms take the form

[TABLE]

(see section D.1). These equations differ from (9a)-(9b) in the use of the tuning function shape matrix $R_{i}$ in place of $S_{t}^{i}$ (defined in (9f)), and $\lambda^{i}\left(\mu_{t}\right)$ in place of $\hat{\lambda}_{t}^{i}$ . The difference between $\lambda^{i}\left(\mu_{t}\right)$ and $\hat{\lambda}_{t}^{i}$ similarly involves substituting $R_{i}$ for $S_{t}^{i}$ ,

[TABLE]

Since $S_{t}^{i}=\left(R_{i}^{-1}+H_{i}^{\intercal}\Sigma_{t}H_{i}\right)^{-1}$ , our filtering equations take into account the posterior variance $\Sigma_{t}$ in several places where it is absent in (55). Note that when $\Sigma_{t}=0$ we have $R_{i}=S_{t}^{i}$ and $\lambda\left(\mu_{t}\right)=\hat{\lambda}_{t}$ , so the equations become increasingly similar when $\Sigma_{t}\to 0$ . We refer to the filter (55) as the Eden-Brown (EB) filter.

We compared the performance of our filter and (55) in simulations of a simple one-dimensional setup. Figure 15 shows an example of filtering a static one-dimensional state observed through two Gaussian neurons (3), using both the ADF approximation (9a)-(9b) and the EB filter (55) for comparison. Since the Mean Square Error is highly sensitive to outliers where the approximation fails and the filtering error becomes large, we compare the filters by observing the distribution of Absolute Errors (AE) in estimating the state. Figure 15 compares the AE in estimating the state for the two filters. As expected from the analysis above, the ADF filter has an advantage particularly in earlier times, when the posterior variance is large. This is seen most clearly in the 95th percentile of AEs in Figure 15(a), and in the tail histograms (15(c)), but a small advantage may also be observed for the median in 15(d). However, in some trials the error in the EB filter remains large throughout.

In Figure 15 (left) we chose the preferred stimuli $-0.51$ and $0.5$ . These are not symmetric around 0 due to a limitation of the EB filter: when applied to a homogeneous population and the current posterior mean is precisely in the average of the population’s preferred stimuli, the posterior mean remains constant until the next spike, while the posterior variance evolves as

[TABLE]

which diverges in finite time if the constant coefficient is positive. The coefficient is positive when $\theta_{i}$ are close to $\mu_{t}$ . This causes divergence of the filter when the preferred stimuli are symmetric around the initial estimate $\mu_{0}$ and sufficiently near it, and the first spike is sufficiently delayed. To avoid this behavior we chose preferred stimuli that are not symmetric around $\mu_{0}=0$ . This asymmetry causes an eventual shift in the posterior mean in the absence of spikes, which suppresses the growth of the posterior variance. This effect may cause high estimation errors before the first spike, as evident in Figure 15d (left).

{sidewaysfigure}

Distribution of absolute estimation errors as a function of time, collected from 10,000 trials. Parameters are the same as in Figure (15), with the state sampled from $\mathcal{N}\left(0,1\right)$ in each trial. Preferred stimuli are noted above each subfigure.* *(a) Medians of the distribution of AEs for ADF (orange) and the EB filter (blue), along with 5th and 95th percentiles, as a function of time. The medians are indistinguishable at this scale. (b) Histograms of AEs for some specific times, with logarithmically spaced bins. The vertical dotted line indicates the AE at filter initialization, which equals the prior expectation $\mathbf{E}\left|X_{0}-\mu_{0}\right|=\sigma_{0}\Phi^{-1}\left(\frac{3}{4}\right)$ where $\Phi^{-1}$ is the quantile function for the standard normal distribution. (c) Histogram of AEs larger than the initial AE. **(d) **Normalized differences of AE percentiles $\left(p_{\mathrm{ADF}}^{r}-p_{\mathrm{EB}}^{r}\right)/\frac{1}{2}\left(p_{\mathrm{ADF}}^{r}+p_{\mathrm{EB}}^{r}\right)$ , where $p_{\mathrm{ADF}}^{r},p_{\mathrm{EB}}^{r}$ are the $r$ th quantile of the AE distribution for the ADF and EB filters respectively, for $r=0.5,0.05,0.95$ . Negative values indicate an advantage of ADF over EB. Shaded areas indicate 95% confidence intervals derived via bootstrapping (e.g. (Efron \BBA Tibshirani, \APACyear1994)).

D.1 Derivation of (55)

In our notation, the filtering equation of (Eden \BBA Brown, \APACyear2008) read

[TABLE]

where $\nabla,\nabla^{2}$ denote the gradient and Hessian respectively, and $S_{t^{-}}^{\mathrm{EB}}$ is defined as

[TABLE]

**Note **this is a corrected version of the definition used in (Eden \BBA Brown, \APACyear2008), which is unusable when the Hessian is singular (this occurs in our model whenever $m<n$ ). The definition of $S_{t}^{\mathrm{EB}}$ given here extends it to the singular case.

For a Gaussian firing rate, (3), the relevant gradient and Hessians are given by

[TABLE]

so

[TABLE]

where $S_{t}^{i}$ is given by (9f). Substituting into (56) yields the continuous updates (55a)-(55b), and the discontinuous updates

[TABLE]

The discontinuous variance update is identical to the ADF update (9c). The discontinuous mean update can be rewritten in terms of $\Sigma_{t^{-}}$ by noting that

[TABLE]

so the mean update may be rewritten as

[TABLE]

in agreement with (9c).

Bibliography47

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Benucci \B Others . ( \APA Cyear 2013) \APA Cinsertmetastar Benucci 2013 {APA Crefauthors} Benucci, A., Saleem, A \BPBI B. \BCBL \BBA Carandini, M. \APA Cref Year Month Day 2013. \BBOQ \APA Crefatitle Adaptation maintains population homeostasis in primary visual cortex. Adaptation maintains population homeostasis in primary visual cortex. \BBCQ \APA Cjournal Vol Num Pages Nature neuroscience 166724–9. {APA Cref DOI} \doi 10.1038/nn.3382 \Print Back Refs \Current Bib
2Bethge \B Others . ( \APA Cyear 2002) \APA Cinsertmetastar Bethge 2002 {APA Crefauthors} Bethge, M., Rotermund, D. \BCBL \BBA Pawelzik, K. \APA Cref Year Month Day 2002 Oct. \BBOQ \APA Crefatitle Optimal short-term population coding: when Fisher information fails. Optimal short-term population coding: when fisher information fails. \BBCQ \APA Cjournal Vol Num Pages Neural Comput 14102317–2351. {APA Cref DOI} \doi 10.1162/08997660260293247 \Print Back Refs \Current Bib
3Bobrowski \B Others . ( \APA Cyear 2009) \APA Cinsertmetastar Bob Mei Eld 09 {APA Crefauthors} Bobrowski, O., Meir, R. \BCBL \BBA Eldar, Y. \APA Cref Year Month Day 2009 May. \BBOQ \APA Crefatitle Bayesian filtering in spiking neural networks: noise, adaptation, and multisensory integration. Bayesian filtering in spiking neural networks: noise, adaptation, and multisensory integration. \BBCQ \APA Cjournal Vol Num Pages Neural Comput 2151277–1320. {APA Cref DOI} \doi 10.1162/neco.2008.01-0
4Brand \B Others . ( \APA Cyear 2002) \APA Cinsertmetastar Brand 2002 {APA Crefauthors} Brand, A., Behrend, O., Marquardt, T., Mc Alpine, D. \BCBL \BBA Grothe, B. \APA Cref Year Month Day 2002. \BBOQ \APA Crefatitle Precise inhibition is essential for microsecond interaural time difference coding. Precise inhibition is essential for microsecond interaural time difference coding. \BBCQ \APA Cjournal Vol Num Pages Nature 4176888543–547. {APA Cref DOI} \doi 10.1038/417543 a \Print Back Refs
5Brémaud ( \APA Cyear 1981) \APA Cinsertmetastar Bremaud 81 {APA Crefauthors} Brémaud, P. \APA Cref Year 1981. \APA Crefbtitle Point Processes and Queues: Martingale Dynamics Point processes and queues: Martingale dynamics. \APA Caddress Publisher Springer, New York. \Print Back Refs \Current Bib
6Brigo \B Others . ( \APA Cyear 1999) \APA Cinsertmetastar Bri Han Leg 99 {APA Crefauthors} Brigo, D., Hanzon, B., Le Gland, F. \BCBL \B Others Period . \APA Cref Year Month Day 1999. \BBOQ \APA Crefatitle Approximate nonlinear filtering by projection on exponential manifolds of densities Approximate nonlinear filtering by projection on exponential manifolds of densities. \BBCQ \APA Cjournal Vol Num Pages Bernoulli 53495–534. \Print Back Refs \Current Bib
7Brown \B Others . ( \APA Cyear 2002) \APA Cinsertmetastar brown 2002 time {APA Crefauthors} Brown, E \BPBI N., Barbieri, R., Ventura, V., Kass, R \BPBI E. \BCBL \BBA Frank, L \BPBI M. \APA Cref Year Month Day 2002. \BBOQ \APA Crefatitle The time-rescaling theorem and its application to neural spike train data analysis The time-rescaling theorem and its application to neural spike train data analysis. \BBCQ \APA Cjournal Vol Num Pages Neural computation 142325–346. \Print Back Refs \Current Bi
8Chelaru \BBA Dragoi ( \APA Cyear 2008) \APA Cinsertmetastar Che Dra 08 {APA Crefauthors} Chelaru, M. \BCBT \BBA Dragoi, V. \APA Cref Year Month Day 2008. \BBOQ \APA Crefatitle Efficient coding in heterogeneous neuronal populations Efficient coding in heterogeneous neuronal populations. \BBCQ \APA Cjournal Vol Num Pages Proceedings of the National Academy of Sciences 1054216344–16349. \Print Back Refs \Current Bib

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Optimal decoding of dynamic stimuli encoded by heterogeneous populations

Abstract

1 Introduction

2 Problem Overview

3 Decoding

3.1 A Finite Population of Gaussian Neurons

3.1.1 State and observation model

3.1.2 Model limitations

3.1.3 Exact filtering equations

3.1.4 ADF approximation

3.1.5 Interpretation

Discontinuous updates

Continuous updates

3.2 Continuous population approximation

3.2.1 Motivation

3.2.2 Marked point processes as continuous population models

3.2.3 Filtering

Single neuron

Uniform population

Gaussian population

Uniform population on an interval

Finite mixtures

4 Numerical evaluation

5 Encoding

5.1 Optimal encoding depends on prior variance

5.2 Adaptation of homogeneous population to stimulus statistics

6 Comparison to Previous Work

6.1 Neural Decoding

6.2 Neural Encoding

Acknowledgements

Appendix A Derivation of filtering equations

A.1 Setting and notation

A.2 The Innovation Measure

A.3 Exact filtering equations

A.4 ADF approximation for Gaussian tuning

Claim 1*.*

Proof.

Claim 2*.*

A.5 Approximation of continuous terms for specific population distributions

A.5.1 Gaussian population

A.5.2 Uniform population on an interval

Appendix B Implementation Details

B.1 State dynamics

B.2 Continuous neural population

B.3 Filter

Appendix C Variance as proxy for MSE

Appendix D Comparison to previous works – additional details

D.1 Derivation of (55)

*Claim 1**.*

*Claim 2**.*