Markov subsampling based Huber Criterion

Tieliang Gong; Yuxin Dong; Hong Chen; Bo Dong; Chen Li

arXiv:2112.06134·stat.ML·March 7, 2022

Markov subsampling based Huber Criterion

Tieliang Gong, Yuxin Dong, Hong Chen, Bo Dong, Chen Li

PDF

Open Access

TL;DR

This paper introduces a Markov subsampling method based on the Huber criterion to improve data selection in noisy large datasets, enhancing statistical consistency and robustness.

Contribution

The paper proposes a novel Markov subsampling strategy using the Huber criterion, addressing outlier issues in importance sampling for big data analysis.

Findings

01

HMS achieves statistical consistency under mild conditions.

02

HMS demonstrates robust performance in large-scale simulations.

03

HMS effectively reduces outlier influence in real data applications.

Abstract

Subsampling is an important technique to tackle the computational challenges brought by big data. Many subsampling procedures fall within the framework of importance sampling, which assigns high sampling probabilities to the samples appearing to have big impacts. When the noise level is high, those sampling procedures tend to pick many outliers and thus often do not perform satisfactorily in practice. To tackle this issue, we design a new Markov subsampling strategy based on Huber criterion (HMS) to construct an informative subset from the noisy full data; the constructed subset then serves as a refined working data for efficient processing. HMS is built upon a Metropolis-Hasting procedure, where the inclusion probability of each sampling unit is determined using the Huber criterion to prevent over scoring the outliers. Under mild conditions, we show that the estimator based on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Statistical Methods and Inference · Statistical Methods and Bayesian Inference