# Khiops: An End-to-End, Frugal AutoML and XAI Machine Learning Solution for Large, Multi-Table Databases

**Authors:** Marc Boull\'e, Nicolas Voisine, Bruno Guerraz, Carine Hue, Felipe Olmos, Vladimir Popescu, St\'ephane Gouache, St\'ephane Bouget, Alexis Bondu, Luc Aurelien Gauthier, Yassine Nair Benrekia, Fabrice Cl\'erot, Vincent Lemaire

arXiv: 2508.20519 · 2025-11-04

## TL;DR

Khiops is an open source, scalable AutoML and explainability tool designed for large multi-table databases, utilizing a Bayesian approach for variable selection, classification, and propositionalisation.

## Contribution

It introduces a novel Bayesian-based AutoML framework tailored for large, complex multi-table databases, combining variable selection, propositionalisation, and explainability.

## Key findings

- Handles databases with millions of records and thousands of variables.
- Provides variable importance measures and automatic feature aggregation.
- Achieves scalable performance on large multi-table datasets.

## Abstract

Khiops is an open source machine learning tool designed for mining large multi-table databases. Khiops is based on a unique Bayesian approach that has attracted academic interest with more than 20 publications on topics such as variable selection, classification, decision trees and co-clustering. It provides a predictive measure of variable importance using discretisation models for numerical data and value clustering for categorical data. The proposed classification/regression model is a naive Bayesian classifier incorporating variable selection and weight learning. In the case of multi-table databases, it provides propositionalisation by automatically constructing aggregates. Khiops is adapted to the analysis of large databases with millions of individuals, tens of thousands of variables and hundreds of millions of records in secondary tables. It is available on many environments, both from a Python library and via a user interface.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20519/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20519/full.md

## References

26 references — full list in the complete paper: https://tomesphere.com/paper/2508.20519/full.md

---
Source: https://tomesphere.com/paper/2508.20519