Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced   Datasets in Machine Learning

Guillaume Lemaitre; Fernando Nogueira; Christos K. Aridas

arXiv:1609.06570·cs.LG·September 22, 2016·1.6k cites

Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning

Guillaume Lemaitre, Fernando Nogueira, Christos K. Aridas

PDF

Open Access 2 Repos

TL;DR

Imbalanced-learn is a Python toolbox offering diverse methods like under-sampling, over-sampling, and ensemble techniques to address the challenges of imbalanced datasets in machine learning, fully compatible with scikit-learn.

Contribution

It introduces a comprehensive, easy-to-use Python toolbox with state-of-the-art methods for handling imbalanced datasets, integrated with scikit-learn.

Findings

01

Provides a wide range of imbalanced data handling methods

02

Ensures compatibility with scikit-learn ecosystem

03

Open-source with extensive documentation and testing

Abstract

Imbalanced-learn is an open-source python toolbox aiming at providing a wide range of methods to cope with the problem of imbalanced dataset frequently encountered in machine learning and pattern recognition. The implemented state-of-the-art methods can be categorized into 4 groups: (i) under-sampling, (ii) over-sampling, (iii) combination of over- and under-sampling, and (iv) ensemble learning methods. The proposed toolbox only depends on numpy, scipy, and scikit-learn and is distributed under MIT license. Furthermore, it is fully compatible with scikit-learn and is part of the scikit-learn-contrib supported project. Documentation, unit tests as well as integration tests are provided to ease usage and contribution. The toolbox is publicly available in GitHub: https://github.com/scikit-learn-contrib/imbalanced-learn.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImbalanced Data Classification Techniques · Machine Learning and Data Classification · Artificial Intelligence in Healthcare