Transforming variables to central normality
Jakob Raymaekers, Peter J. Rousseeuw

TL;DR
This paper introduces a robust modification of the Box-Cox and Yeo-Johnson transformations, along with a new estimator, to better normalize data with outliers, improving preprocessing for statistical analysis.
Contribution
It proposes a robust transformation and estimator that maintain central normality despite outliers, enhancing data preprocessing methods.
Findings
Robust transformations outperform standard methods in simulations.
The new estimator effectively handles outliers while normalizing data.
Improved normality in real data applications.
Abstract
Many real data sets contain numerical features (variables) whose distribution is far from normal (gaussian). Instead, their distribution is often skewed. In order to handle such data it is customary to preprocess the variables to make them more normal. The Box-Cox and Yeo-Johnson transformations are well-known tools for this. However, the standard maximum likelihood estimator of their transformation parameter is highly sensitive to outliers, and will often try to move outliers inward at the expense of the normality of the central part of the data. We propose a modification of these transformations as well as an estimator of the transformation parameter that is robust to outliers, so the transformed data can be approximately normal in the center and a few outliers may deviate from it. It compares favorably to existing techniques in an extensive simulation study and on real data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
