Exploiting non-i.i.d. data towards more robust machine learning   algorithms

Wim Casteels; Peter Hellinckx

arXiv:2010.03429·cs.LG·October 8, 2020

Exploiting non-i.i.d. data towards more robust machine learning algorithms

Wim Casteels, Peter Hellinckx

PDF

Open Access

TL;DR

This paper introduces a regularization scheme for machine learning that enhances robustness by favoring causal correlations and handling non-i.i.d. data, improving out-of-distribution generalization.

Contribution

A novel regularization method that promotes universal causal correlations by leveraging non-i.i.d. data clustering, improving model robustness and generalization.

Findings

01

Better performance on out-of-distribution test sets.

02

Regularization favors invariant, causal correlations.

03

Improved robustness over traditional l2-regularization.

Abstract

In the field of machine learning there is a growing interest towards more robust and generalizable algorithms. This is for example important to bridge the gap between the environment in which the training data was collected and the environment where the algorithm is deployed. Machine learning algorithms have increasingly been shown to excel in finding patterns and correlations from data. Determining the consistency of these patterns and for example the distinction between causal correlations and nonsensical spurious relations has proven to be much more difficult. In this paper a regularization scheme is introduced that prefers universal causal correlations. This approach is based on 1) the robustness of causal correlations and 2) the data not being independently and identically distribute (i.i.d.). The scheme is demonstrated with a classification task by clustering the (non-i.i.d.)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Fault Detection and Control Systems · Machine Learning and Data Classification