Random Bits Regression: a Strong General Predictor for Big Data

Yi Wang; Yi Li; Momiao Xiong; Li Jin

arXiv:1501.02990·stat.ML·November 4, 2016

Random Bits Regression: a Strong General Predictor for Big Data

Yi Wang, Yi Li, Momiao Xiong, Li Jin

PDF

TL;DR

Random Bits Regression (RBR) is a fast, robust, and accurate prediction method for big data that generates numerous random binary features and applies regularized regression, outperforming other methods in various datasets.

Contribution

The paper introduces RBR, a novel prediction approach that combines random binary feature generation with regularized regression, enhancing accuracy and speed for large-scale data.

Findings

01

RBR outperforms popular methods in accuracy and robustness.

02

RBR is computationally fast and memory-efficient.

03

RBR is effective across diverse datasets, including simulated, UCI, and GWAS.

Abstract

To improve accuracy and speed of regressions and classifications, we present a data-based prediction method, Random Bits Regression (RBR). This method first generates a large number of random binary intermediate/derived features based on the original input matrix, and then performs regularized linear/logistic regression on those intermediate/derived features to predict the outcome. Benchmark analyses on a simulated dataset, UCI machine learning repository datasets and a GWAS dataset showed that RBR outperforms other popular methods in accuracy and robustness. RBR (available on https://sourceforge.net/projects/rbr/) is very fast and requires reasonable memories, therefore, provides a strong, robust and fast predictor in the big data era.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings