An Aggregation Method for Sparse Logistic Regression

Zhe Liu

arXiv:1410.6959·stat.ML·February 12, 2015·Int. J. Data Min. Bioinform.

An Aggregation Method for Sparse Logistic Regression

Zhe Liu

PDF

Open Access

TL;DR

This paper introduces an aggregation method for sparse logistic regression that combines multiple models to improve feature selection and predictive accuracy in high-dimensional data, demonstrated through simulations and real genome data.

Contribution

It proposes a novel aggregation approach for sparse logistic regression that balances prediction and interpretability, addressing false positives in feature selection.

Findings

01

Improved feature selection accuracy in high-dimensional settings

02

Enhanced predictive performance over traditional L1 regularization

03

Effective application to genome-wide association data

Abstract

$L_{1}$ regularized logistic regression has now become a workhorse of data mining and bioinformatics: it is widely used for many classification problems, particularly ones with many features. However, $L_{1}$ regularization typically selects too many features and that so-called false positives are unavoidable. In this paper, we demonstrate and analyze an aggregation method for sparse logistic regression in high dimensions. This approach linearly combines the estimators from a suitable set of logistic models with different underlying sparsity patterns and can balance the predictive ability and model interpretability. Numerical performance of our proposed aggregation method is then investigated using simulation studies. We also analyze a published genome-wide case-control dataset to further evaluate the usefulness of the aggregation method in multilocus association mapping.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Gene expression and cancer classification · Liver Disease Diagnosis and Treatment

MethodsLogistic Regression