Algebraic Machine Learning for Small-to-Medium Datasets Is Competitive against Strong Standard Baselines

David Mendez; Fernando Martin-Maroto; Gonzalo G. de Polavieja

arXiv:2605.22155·cs.LG·May 22, 2026

Algebraic Machine Learning for Small-to-Medium Datasets Is Competitive against Strong Standard Baselines

David Mendez, Fernando Martin-Maroto, Gonzalo G. de Polavieja

PDF

TL;DR

Algebraic Machine Learning (AML), a structure-based symbolic method, outperforms some standard baselines on small-to-medium datasets without hyperparameter tuning, demonstrating competitive performance across image and tabular data.

Contribution

This work introduces AML, a novel algebraic framework that competes with traditional machine learning methods using a generic inductive bias without requiring hyperparameter tuning.

Findings

01

AML outperforms cross-validated CNNs on small image datasets.

02

AML is comparable to LightGBM and random forests on tabular data.

03

AML requires no cross-validation or task-specific hyperparameters.

Abstract

Symbolic methods are generally not considered competitive with strong modern learners on realistic supervised tasks. We evaluate Algebraic Machine Learning (AML), a framework that learns through subdirect decomposition of algebraic structure rather than numerical optimization, against standard baselines on image and tabular classification across varying training-set sizes. We find that AML trained only on training data without using validation or cross-validation outperforms a family of cross-validated baseline methods including CNNs on small to medium image datasets (50--2000 training examples). On tabular datasets in the same size range, XGBoost is overall the best performing method, but AML is nonetheless comparable to methods incorporating task-specific biases such as LightGBM and random forests. AML achieves this competitive performance across two very different types of datasets…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.