Fitting Semiparametric Cumulative Probability Models for Big Data

Chun Li; Guo Chen; Bryan E. Shepherd

arXiv:2207.06562·stat.CO·July 15, 2022·1 cites

Fitting Semiparametric Cumulative Probability Models for Big Data

Chun Li, Guo Chen, Bryan E. Shepherd

PDF

Open Access

TL;DR

This paper introduces three scalable methods for fitting cumulative probability models to large datasets, improving computational efficiency while maintaining accuracy, demonstrated through simulations and a large-scale application.

Contribution

It proposes divide-and-combine, binning, and rounding approaches to make CPMs feasible for big data, with theoretical consistency and practical performance evaluation.

Findings

01

Methods perform well in simulations

02

Parameter estimates are consistent

03

Approaches reduce running time and memory usage

Abstract

Cumulative probability models (CPMs) are a robust alternative to linear models for continuous outcomes. However, they are not feasible for very large datasets due to elevated running time and memory usage, which depend on the sample size, the number of predictors, and the number of distinct outcomes. We describe three approaches to address this problem. In the divide-and-combine approach, we divide the data into subsets, fit a CPM to each subset, and then aggregate the information. In the binning and rounding approaches, the outcome variable is redefined to have a greatly reduced number of distinct values. We consider rounding to a decimal place and rounding to significant digits, both with a refinement step to help achieve the desired number of distinct outcomes. We show with simulations that these approaches perform well and their parameter estimates are consistent. We investigate how…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Complex Network Analysis Techniques · Statistical Methods and Bayesian Inference