Learning with Sparsely Permuted Data: A Robust Bayesian Approach

Abhisek Chakraborty; Saptati Datta

arXiv:2409.10678·math.ST·September 18, 2024·IEEE Big Data

Learning with Sparsely Permuted Data: A Robust Bayesian Approach

Abhisek Chakraborty, Saptati Datta

PDF

Open Access

TL;DR

This paper introduces a robust Bayesian method for regression with data where predictor or response identifiers are permuted, providing theoretical guarantees and efficient sampling techniques for handling sparsely permuted data.

Contribution

It presents a novel generalized Bayesian framework and sampling scheme for sparse permutation problems, with theoretical guarantees and practical efficiency improvements.

Findings

01

Effective posterior sampling scheme developed

02

Theoretical posterior contraction guarantees established

03

Demonstrated superior performance in numerical experiments

Abstract

Data dispersed across multiple files are commonly integrated through probabilistic linkage methods, where even minimal error rates in record matching can significantly contaminate subsequent statistical analyses. In regression problems, we examine scenarios where the identifiers of predictors or responses are subject to an unknown permutation, challenging the assumption of correspondence. Many emerging approaches in the literature focus on sparsely permuted data, where only a small subset of pairs ( $k << n$ ) are affected by the permutation, treating these permuted entries as outliers to restore original correspondence and obtain consistent estimates of regression parameters. In this article, we complement the existing literature by introducing a novel generalized robust Bayesian formulation of the problem. We develop an efficient posterior sampling scheme by adapting the fractional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning · Machine Learning and Data Classification