# Bayesian nonparametric multiway regression for clustered binomial data

**Authors:** Eric F. Lock, Dipankar Bandyopadhyay

arXiv: 1901.11172 · 2019-02-01

## TL;DR

This paper presents a Bayesian nonparametric multiway regression model for clustered binomial data, effectively capturing complex structures and dependencies in multi-dimensional datasets, demonstrated on periodontal disease data.

## Contribution

It introduces a novel nonparametric regression approach with covariate-dependent clustering for multiway binomial data, utilizing low-rank assumptions for dimensionality reduction.

## Key findings

- Model outperforms competitors in periodontal disease data analysis.
- Flexible probit stick-breaking formulation captures covariate dependence.
- Efficient Gibbs sampling enables practical posterior computation.

## Abstract

We introduce a Bayesian nonparametric regression model for data with multiway (tensor) structure, motivated by an application to periodontal disease (PD) data. Our outcome is the number of diseased sites measured over four different tooth types for each subject, with subject-specific covariates available as predictors. The outcomes are not well-characterized by simple parametric models, so we use a nonparametric approach with a binomial likelihood wherein the latent probabilities are drawn from a mixture with an arbitrary number of components, analogous to a Dirichlet Process (DP). We use a flexible probit stick-breaking formulation for the component weights that allows for covariate dependence and clustering structure in the outcomes. The parameter space for this model is large and multiway: patients $\times$ tooth types $\times$ covariates $\times$ components. We reduce its effective dimensionality, and account for the multiway structure, via low-rank assumptions. We illustrate how this can improve performance, and simplify interpretation, while still providing sufficient flexibility. We describe a general and efficient Gibbs sampling algorithm for posterior computation. The resulting fit to the PD data outperforms competitors, and is interpretable and well-calibrated. An interactive visual of the predictive model is available at http://ericfrazerlock.com/toothdata/ToothDisplay.html , and the code is available at https://github.com/lockEF/NonparametricMultiway .

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.11172/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/1901.11172/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/1901.11172/full.md

---
Source: https://tomesphere.com/paper/1901.11172