Big Data Classification Using Augmented Decision Trees

Rajiv Sambasivan; Sourish Das

arXiv:1710.09567·stat.ML·October 27, 2017·2 cites

Big Data Classification Using Augmented Decision Trees

Rajiv Sambasivan, Sourish Das

PDF

Open Access

TL;DR

This paper introduces an interpretable classification algorithm for big data that combines decision trees with local classifiers, achieving accuracy comparable to ensemble methods while maintaining interpretability.

Contribution

The paper proposes a novel divide and conquer algorithm that integrates decision trees with local classifiers for scalable, interpretable big data classification.

Findings

01

Algorithm achieves accuracy similar to ensemble methods

02

Models are easily interpretable

03

Effective on large datasets

Abstract

We present an algorithm for classification tasks on big data. Experiments conducted as part of this study indicate that the algorithm can be as accurate as ensemble methods such as random forests or gradient boosted trees. Unlike ensemble methods, the models produced by the algorithm can be easily interpreted. The algorithm is based on a divide and conquer strategy and consists of two steps. The first step consists of using a decision tree to segment the large dataset. By construction, decision trees attempt to create homogeneous class distributions in their leaf nodes. However, non-homogeneous leaf nodes are usually produced. The second step of the algorithm consists of using a suitable classifier to determine the class labels for the non-homogeneous leaf nodes. The decision tree segment provides a coarse segment profile while the leaf level classifier can provide information about the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace and Expression Recognition · Data Mining Algorithms and Applications · Machine Learning and Data Classification