Fitting Probabilistic Index Models on Large Datasets

Han Bossier; Gustavo Amorim; Jan De Neve; Olivier Thas

arXiv:1808.05868·stat.CO·August 20, 2018

Fitting Probabilistic Index Models on Large Datasets

Han Bossier, Gustavo Amorim, Jan De Neve, Olivier Thas

PDF

Open Access 1 Repo

TL;DR

This paper introduces scalable algorithms for fitting probabilistic index models on large datasets, addressing computational challenges by partitioning data and subsampling, and demonstrates their effectiveness on real and simulated data.

Contribution

It proposes two novel algorithms—partitioning and subsampling—for efficiently estimating probabilistic index models on large datasets.

Findings

01

Partitioning algorithm outperforms subsampling in simulations

02

Applied method to large adolescent smartphone usage data

03

Moderate usage linked to higher mental well-being, excessive use linked to lower well-being

Abstract

Recently, Thas et al. (2012) introduced a new statistical model for the probability index. This index is defined as $P (Y \leq Y^{*} ∣ X, X^{*})$ where Y and Y* are independent random response variables associated with covariates X and X* [...] Crucially to estimate the parameters of the model, a set of pseudo-observations is constructed. For a sample size n, a total of $n (n - 1) /2$ pairwise comparisons between observations is considered. Consequently for large sample sizes, it becomes computationally infeasible or even impossible to fit the model as the set of pseudo-observations increases nearly quadratically. In this dissertation, we provide two solutions to fit a probabilistic index model. The first algorithm consists of splitting the entire data set into unique partitions. On each of these, we fit the model and then aggregate the estimates. A second algorithm is a subsampling scheme in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HBossier/BigDataPIM
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Bayesian Inference · Mental Health Research Topics · Statistical Methods and Inference