Cost-sensitive C4.5 with post-pruning and competition
Zilong Xu, Fan Min, William Zhu

TL;DR
This paper introduces a cost-sensitive decision tree algorithm based on C4.5 for numeric data, incorporating test cost weighted information gain, post-pruning, and a competition strategy to minimize total costs in classification tasks.
Contribution
It extends cost-sensitive decision tree learning to numeric data using a new heuristic and pruning, with a competitive strategy for low-cost decision trees.
Findings
The algorithm is stable and effective.
Post-pruning significantly reduces total costs.
Competition strategy improves cost sensitivity.
Abstract
Decision tree is an effective classification approach in data mining and machine learning. In applications, test costs and misclassification costs should be considered while inducing decision trees. Recently, some cost-sensitive learning algorithms based on ID3 such as CS-ID3, IDX, \lambda-ID3 have been proposed to deal with the issue. These algorithms deal with only symbolic data. In this paper, we develop a decision tree algorithm inspired by C4.5 for numeric data. There are two major issues for our algorithm. First, we develop the test cost weighted information gain ratio as the heuristic information. According to this heuristic information, our algorithm is to pick the attribute that provides more gain ratio and costs less for each selection. Second, we design a post-pruning strategy through considering the tradeoff between test costs and misclassification costs of the generated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Rough Sets and Fuzzy Logic · Data Mining Algorithms and Applications
