# Semi-Supervised Non-Parametric Bayesian Modelling of Spatial Proteomics

**Authors:** Oliver M. Crook, Kathryn S. Lilley, Laurent Gatto, Paul D. W. Kirk

arXiv: 1903.02909 · 2019-03-12

## TL;DR

This paper introduces a semi-supervised non-parametric Bayesian model using Gaussian process mixtures to analyze spatial proteomics data, capturing complex correlation structures within sub-cellular niches efficiently.

## Contribution

It develops a novel Gaussian process mixture model with an efficient Hamiltonian-within-Gibbs sampler and covariance matrix decomposition for spatial proteomics analysis.

## Key findings

- Effective modeling of protein localization with Gaussian processes.
- Reduced computational complexity via tensor decomposition and specialized algorithms.
- Open-source R package implementation available.

## Abstract

Understanding sub-cellular protein localisation is an essential component to analyse context specific protein function. Recent advances in quantitative mass-spectrometry (MS) have led to high resolution mapping of thousands of proteins to sub-cellular locations within the cell. Novel modelling considerations to capture the complex nature of these data are thus necessary. We approach analysis of spatial proteomics data in a non-parametric Bayesian framework, using mixtures of Gaussian process regression models. The Gaussian process regression model accounts for correlation structure within a sub-cellular niche, with each mixture component capturing the distinct correlation structure observed within each niche. Proteins with a priori labelled locations motivate using semi-supervised learning to inform the Gaussian process hyperparameters. We moreover provide an efficient Hamiltonian-within-Gibbs sampler for our model. As in other recent work, we reduce the computational burden associated with inversion of covariance matrices by exploiting the structure in the covariance matrix. A tensor decomposition of our covariance matrices allows extended Trench and Durbin algorithms to be applied it order to reduce the computational complexity of inversion and hence accelerate computation. A stand-alone R-package implementing these methods using high-performance C++ libraries is available at: https://github.com/ococrook/toeplitz

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.02909/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/1903.02909/full.md

## References

91 references — full list in the complete paper: https://tomesphere.com/paper/1903.02909/full.md

---
Source: https://tomesphere.com/paper/1903.02909