Fitting Probabilistic Index Models on Large Datasets
Han Bossier, Gustavo Amorim, Jan De Neve, Olivier Thas

TL;DR
This paper introduces scalable algorithms for fitting probabilistic index models on large datasets, addressing computational challenges by partitioning data and subsampling, and demonstrates their effectiveness on real and simulated data.
Contribution
It proposes two novel algorithms—partitioning and subsampling—for efficiently estimating probabilistic index models on large datasets.
Findings
Partitioning algorithm outperforms subsampling in simulations
Applied method to large adolescent smartphone usage data
Moderate usage linked to higher mental well-being, excessive use linked to lower well-being
Abstract
Recently, Thas et al. (2012) introduced a new statistical model for the probability index. This index is defined as where Y and Y* are independent random response variables associated with covariates X and X* [...] Crucially to estimate the parameters of the model, a set of pseudo-observations is constructed. For a sample size n, a total of pairwise comparisons between observations is considered. Consequently for large sample sizes, it becomes computationally infeasible or even impossible to fit the model as the set of pseudo-observations increases nearly quadratically. In this dissertation, we provide two solutions to fit a probabilistic index model. The first algorithm consists of splitting the entire data set into unique partitions. On each of these, we fit the model and then aggregate the estimates. A second algorithm is a subsampling scheme in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Mental Health Research Topics · Statistical Methods and Inference
