# Decision Stream: Cultivating Deep Decision Trees

**Authors:** Dmitry Ignatov, Andrey Ignatov

arXiv: 1704.07657 · 2017-09-05

## TL;DR

The paper introduces Decision Stream, a novel deep graph-based decision model that merges similar nodes to address overfitting and complexity issues in traditional decision trees, demonstrating significant performance improvements across various tasks.

## Contribution

It proposes a new architecture that merges nodes based on similarity, creating a deep directed acyclic graph instead of a tree, improving over standard decision tree methods.

## Key findings

- Outperforms standard decision trees with up to 35% error reduction
- Effective on diverse tasks including classification and regression
- Creates deep decision graphs with hundreds of levels

## Abstract

Various modifications of decision trees have been extensively used during the past years due to their high efficiency and interpretability. Tree node splitting based on relevant feature selection is a key step of decision tree learning, at the same time being their major shortcoming: the recursive nodes partitioning leads to geometric reduction of data quantity in the leaf nodes, which causes an excessive model complexity and data overfitting. In this paper, we present a novel architecture - a Decision Stream, - aimed to overcome this problem. Instead of building a tree structure during the learning process, we propose merging nodes from different branches based on their similarity that is estimated with two-sample test statistics, which leads to generation of a deep directed acyclic graph of decision rules that can consist of hundreds of levels. To evaluate the proposed solution, we test it on several common machine learning problems - credit scoring, twitter sentiment analysis, aircraft flight control, MNIST and CIFAR image classification, synthetic data classification and regression. Our experimental results reveal that the proposed approach significantly outperforms the standard decision tree learning methods on both regression and classification tasks, yielding a prediction error decrease up to 35%.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1704.07657/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/1704.07657/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/1704.07657/full.md

---
Source: https://tomesphere.com/paper/1704.07657