# missSBM: An R Package for Handling Missing Values in the Stochastic   Block Model

**Authors:** Pierre Barbillon, Julien Chiquet, Timoth\'ee Tabouy

arXiv: 1906.12201 · 2021-05-28

## TL;DR

missSBM is an R package designed to fit stochastic block models to partially observed network data, accounting for missing values and external covariates, and includes methods for model selection and missing data imputation.

## Contribution

The paper introduces missSBM, a novel R package that handles missing data in stochastic block models using variational inference and model selection techniques.

## Key findings

- Effective imputation of missing network edges.
- Automatic selection of the number of blocks using ICL.
- Application to political blog interaction data.

## Abstract

The Stochastic Block Model (SBM) is a popular probabilistic model for random graphs. It is commonly used for clustering network data by aggregating nodes that share similar connectivity patterns into blocks. When fitting an SBM to a network which is partially observed, it is important to take into account the underlying process that generates the missing values, otherwise the inference may be biased. This paper introduces missSBM, an R-package fitting the SBM when the network is partially observed, i.e., the adjacency matrix contains not only 1's or 0's encoding presence or absence of edges but also NA's encoding missing information between pairs of nodes. This package implements a set of algorithms for fitting the binary SBM, possibly in the presence of external covariates, by performing variational inference adapted to several observation processes. Our implementation automatically explores different block numbers to select the most relevant model according to the Integrated Classification Likelihood (ICL) criterion. The ICL criterion can also help determine which observation process better corresponds to a given dataset. Finally, missSBM can be used to perform imputation of missing entries in the adjacency matrix. We illustrate the package on a network data set consisting of interactions between political blogs sampled during the French presidential election in 2007.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.12201/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1906.12201/full.md

---
Source: https://tomesphere.com/paper/1906.12201