# On the usage of the probability integral transform to reduce the   complexity of multi-way fuzzy decision trees in Big Data classification   problems

**Authors:** Mikel Elkano, Mikel Uriz, Humberto Bustince, Mikel Galar

arXiv: 1903.00345 · 2019-03-04

## TL;DR

This paper introduces a distributed fuzzy partitioning method using the probability integral transform to simplify multi-way fuzzy decision trees in Big Data classification, maintaining accuracy with fewer leaves.

## Contribution

It proposes a novel approach combining the probability integral transform with fuzzy partitioning to reduce decision tree complexity in Big Data scenarios.

## Key findings

- Maintains classification accuracy with up to 6 million fewer leaves.
- Effectively transforms data distribution for simplified fuzzy partitioning.
- Enhances scalability of fuzzy decision trees in Big Data contexts.

## Abstract

We present a new distributed fuzzy partitioning method to reduce the complexity of multi-way fuzzy decision trees in Big Data classification problems. The proposed algorithm builds a fixed number of fuzzy sets for all variables and adjusts their shape and position to the real distribution of training data. A two-step process is applied : 1) transformation of the original distribution into a standard uniform distribution by means of the probability integral transform. Since the original distribution is generally unknown, the cumulative distribution function is approximated by computing the q-quantiles of the training set; 2) construction of a Ruspini strong fuzzy partition in the transformed attribute space using a fixed number of equally distributed triangular membership functions. Despite the aforementioned transformation, the definition of every fuzzy set in the original space can be recovered by applying the inverse cumulative distribution function (also known as quantile function). The experimental results reveal that the proposed methodology allows the state-of-the-art multi-way fuzzy decision tree (FMDT) induction algorithm to maintain classification accuracy with up to 6 million fewer leaves.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1903.00345/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1903.00345/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/1903.00345/full.md

---
Source: https://tomesphere.com/paper/1903.00345