# Embedding Feature Selection for Large-scale Hierarchical Classification

**Authors:** Azad Naik, Huzefa Rangwala

arXiv: 1706.01581 · 2017-06-07

## TL;DR

This paper explores filter-based feature selection methods to improve large-scale hierarchical classification by reducing training time and memory usage while maintaining or enhancing accuracy across text and image datasets.

## Contribution

It systematically evaluates various filter-based feature selection techniques for large-scale HC, demonstrating significant speed-ups and memory reductions without accuracy loss.

## Key findings

- Up to 3x speed-up on massive datasets
- Upto 45% reduction in memory requirements
- No significant loss in classification accuracy

## Abstract

Large-scale Hierarchical Classification (HC) involves datasets consisting of thousands of classes and millions of training instances with high-dimensional features posing several big data challenges. Feature selection that aims to select the subset of discriminant features is an effective strategy to deal with large-scale HC problem. It speeds up the training process, reduces the prediction time and minimizes the memory requirements by compressing the total size of learned model weight vectors. Majority of the studies have also shown feature selection to be competent and successful in improving the classification accuracy by removing irrelevant features. In this work, we investigate various filter-based feature selection methods for dimensionality reduction to solve the large-scale HC problem. Our experimental evaluation on text and image datasets with varying distribution of features, classes and instances shows upto 3x order of speed-up on massive datasets and upto 45% less memory requirements for storing the weight vectors of learned model without any significant loss (improvement for some datasets) in the classification accuracy. Source Code: https://cs.gmu.edu/~mlbio/featureselection.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1706.01581/full.md

## Figures

38 figures with captions in the complete paper: https://tomesphere.com/paper/1706.01581/full.md

## References

33 references — full list in the complete paper: https://tomesphere.com/paper/1706.01581/full.md

---
Source: https://tomesphere.com/paper/1706.01581