Modeling Text with Decision Forests using Categorical-Set Splits

Mathieu Guillame-Bert; Sebastian Bruch; Petr Mitrichev; Petr Mikheev,; Jan Pfeifer

arXiv:2009.09991·cs.LG·February 8, 2021

Modeling Text with Decision Forests using Categorical-Set Splits

Mathieu Guillame-Bert, Sebastian Bruch, Petr Mitrichev, Petr Mikheev,, Jan Pfeifer

PDF

Open Access

TL;DR

This paper introduces a novel categorical-set split condition for decision forests, enabling direct modeling of textual features without prior transformation, and demonstrates its effectiveness on text classification tasks.

Contribution

The work presents a new categorical-set split condition and an efficient learning algorithm, allowing decision forests to directly handle text features.

Findings

01

Effective on benchmark text classification datasets

02

Fast evaluation with extended QuickScorer inference

03

Bridges the gap for modeling textual features in decision forests

Abstract

Decision forest algorithms typically model data by learning a binary tree structure recursively where every node splits the feature space into two sub-regions, sending examples into the left or right branch as a result. In axis-aligned decision forests, the "decision" to route an input example is the result of the evaluation of a condition on a single dimension in the feature space. Such conditions are learned using efficient, often greedy algorithms that optimize a local loss function. For example, a node's condition may be a threshold function applied to a numerical feature, and its parameter may be learned by sweeping over the set of values available at that node and choosing a threshold that maximizes some measure of purity. Crucially, whether an algorithm exists to learn and evaluate conditions for a feature type determines whether a decision forest algorithm can model that feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Topic Modeling · Rough Sets and Fuzzy Logic