Regularization for Shuffled Data Problems via Exponential Family Priors   on the Permutation Group

Zhenbang Wang; Emanuel Ben-David; Martin Slawski

arXiv:2111.01767·stat.ML·November 3, 2021

Regularization for Shuffled Data Problems via Exponential Family Priors on the Permutation Group

Zhenbang Wang, Emanuel Ben-David, Martin Slawski

PDF

Open Access

TL;DR

This paper introduces a regularization method for shuffled data problems using an exponential family prior on permutations, improving inference accuracy in record linkage scenarios with mismatched pairs.

Contribution

It proposes a conjugate exponential family prior on the permutation group, enabling regularized inference for shuffled data with various structural constraints.

Findings

01

The method performs well on synthetic data.

02

It outperforms competing approaches on real data.

03

The EM algorithm efficiently handles large datasets.

Abstract

In the analysis of data sets consisting of (X, Y)-pairs, a tacit assumption is that each pair corresponds to the same observation unit. If, however, such pairs are obtained via record linkage of two files, this assumption can be violated as a result of mismatch error rooting, for example, in the lack of reliable identifiers in the two files. Recently, there has been a surge of interest in this setting under the term "Shuffled data" in which the underlying correct pairing of (X, Y)-pairs is represented via an unknown index permutation. Explicit modeling of the permutation tends to be associated with substantial overfitting, prompting the need for suitable methods of regularization. In this paper, we propose a flexible exponential family prior on the permutation group for this purpose that can be used to integrate various structures such as sparse and locally constrained shuffling. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Distributed Sensor Networks and Detection Algorithms · Sparse and Compressive Sensing Techniques