Exchangeable modelling of relational data: checking sparsity, train-test splitting, and sparse exchangeable Poisson matrix factorization
Victor Veitch, Ekansh Sharma, Zacharie Naulet, and Daniel M. Roy

TL;DR
This paper explores the practical aspects of sparse exchangeable graph models, including how to detect sparsity in data, implications for train-test splitting, and scalable inference methods demonstrated through Poisson matrix factorization.
Contribution
It introduces methods to identify sparsity in datasets, discusses improved train-test splitting strategies, and adapts variational inference for large-scale sparse exchangeable models.
Findings
Network subsampling reveals signatures of sparsity in data.
Common train-test splits can bias results under exchangeable models.
Scalable inference for sparse exchangeable Poisson matrix factorization is achievable.
Abstract
A variety of machine learning tasks---e.g., matrix factorization, topic modelling, and feature allocation---can be viewed as learning the parameters of a probability distribution over bipartite graphs. Recently, a new class of models for networks, the sparse exchangeable graphs, have been introduced to resolve some important pathologies of traditional approaches to statistical network modelling; most notably, the inability to model sparsity (in the asymptotic sense). The present paper explains some practical insights arising from this work. We first show how to check if sparsity is relevant for modelling a given (fixed size) dataset by using network subsampling to identify a simple signature of sparsity. We discuss the implications of the (sparse) exchangeable subsampling theory for test-train dataset splitting; we argue common approaches can lead to biased results, and we propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Bayesian Methods and Mixture Models · Advanced Clustering Algorithms Research
