# From Relational Data to Graphs: Inferring Significant Links using   Generalized Hypergeometric Ensembles

**Authors:** Giona Casiraghi, Vahan Nanumyan, Ingo Scholtes, Frank Schweitzer

arXiv: 1706.04370 · 2021-02-24

## TL;DR

This paper introduces a statistical framework based on generalized hypergeometric ensembles for inferring significant links in noisy relational data, applicable to social, biological, and semantic networks.

## Contribution

It presents a novel analytical approach using hypergeometric ensembles to assess link significance, advancing network inference from relational data.

## Key findings

- Successfully applied to social proximity data
- Provides a tractable probabilistic model
- Enhances inference of meaningful network links

## Abstract

The inference of network topologies from relational data is an important problem in data analysis. Exemplary applications include the reconstruction of social ties from data on human interactions, the inference of gene co-expression networks from DNA microarray data, or the learning of semantic relationships based on co-occurrences of words in documents. Solving these problems requires techniques to infer significant links in noisy relational data. In this short paper, we propose a new statistical modeling framework to address this challenge. It builds on generalized hypergeometric ensembles, a class of generative stochastic models that give rise to analytically tractable probability spaces of directed, multi-edge graphs. We show how this framework can be used to assess the significance of links in noisy relational data. We illustrate our method in two data sets capturing spatio-temporal proximity relations between actors in a social system. The results show that our analytical framework provides a new approach to infer significant links from relational data, with interesting perspectives for the mining of data on social systems.

---
Source: https://tomesphere.com/paper/1706.04370