Multi-modal Gaussian Process Variational Autoencoders for Neural and Behavioral Data
Rabia Gondur, Usama Bin Sikandar, Evan Schaffer, Mikio Christian Aoi, Stephen L Keeley

TL;DR
This paper introduces a novel unsupervised multi-modal latent variable model combining Gaussian Process Factor Analysis and Variational Autoencoders to jointly analyze neural and behavioral data, capturing shared and independent dynamics.
Contribution
It proposes a new multi-modal Gaussian Process VAE that models shared and modality-specific latent dynamics, improving interpretability and latent identification in neural-behavioral data.
Findings
Accurately identifies shared and independent latent structures across modalities.
Provides high-quality reconstructions of neural and behavioral data.
Validated on simulated and real multi-modal datasets.
Abstract
Characterizing the relationship between neural population activity and behavioral data is a central goal of neuroscience. While latent variable models (LVMs) are successful in describing high-dimensional time-series data, they are typically only designed for a single type of data, making it difficult to identify structure shared across different experimental data modalities. Here, we address this shortcoming by proposing an unsupervised LVM which extracts temporally evolving shared and independent latents for distinct, simultaneously recorded experimental modalities. We do this by combining Gaussian Process Factor Analysis (GPFA), an interpretable LVM for neural spiking data with temporally smooth latent space, with Gaussian Process Variational Autoencoders (GP-VAEs), which similarly use a GP prior to characterize correlations in a latent space, but admit rich expressivity due to a deep…
Peer Reviews
Decision·ICLR 2024 poster
1. This submission exploits the multi-modal latent variable models based on GP-VAE, using Poisson and normal likelihoods for spiking and image data respectively, which can help modeling both shared and independent factors across modalities in neuroscience. 2. Experiments on simulated and real-world datasets with visualized results showing the potential in jointly analyzing neuroscience data. 3. The presentation is clear with articulated motivations of the presented mm-GPVAE. Source code is a
1. Methodological contributions may be limited. Many papers have done similar things, such as extending the latent space into frequency domain [1,3] and using the Poisson model for spiking data [1,2]. The method proposed in this paper is combination of existing methods, not fundamentally different. 2. The experimental results can be more comprehensive: (1) More baselines, especially similar methods such as [1,4] need to be compared; (2) The authors stated one main advantage of the Fourier doma
The use of Fourier frequencies as latent variables can have some advantages in learning neural features that correlate with motor or other behaviors. The authors also separate shared from independent variability in the neural data which can be useful in understanding what part of behavior could actually be predicted from specific neuronal measurements.
-The synthetic data seems weak and unnecessary. Why not use a simulator like NEST or augment one and create "behavioral" data with known connections to this. MNIST images have a different structure because they are a two dimensional image. -The authors lack comparison to anything but an ablation of their own model, and that too on the above artificial dataset. -At least the authors should compare to RNN/transformers/ODE or other sequential models for this data -The authors seem to criticize DN
The paper is incredibly well presented. Figures are prepared exceptionally well. The prose is clear, and presents a self-contained introduction to all of the necessary techniques and considerations. The method itself is gracefully simple. Although there is not a huge methodological contribution, the correct components parts are assembled and deployed in a way that is very experimentally useful. I could see this methodology having real uptake within the community, and engendering multiple f
I do have several queries/concerns however: - **a. Fixed time horizon**: The use of an MLP to convert the per-timestep embeddings into per-sequence Fourier coefficients means that you can only consider fixed-length sequences. This seems to me to be a real limitation, since often neural/behavioral data – especially naturalistic behavior – is not of a fixed length. This could be remedied by using an RNN or neural process in place of the MLP, so this is not catastrophic as far as I can tell. H
Videos
Taxonomy
TopicsNeural dynamics and brain function · Neurobiology and Insect Physiology Research · Gaussian Processes and Bayesian Inference
MethodsGaussian Process
