MacrOData: New Benchmarks of Thousands of Datasets for Tabular Outlier Detection

Xueying Ding; Simon Kl\"uttermann; Haomin Wen; Yilong Chen; Leman Akoglu

arXiv:2602.09329·cs.LG·April 27, 2026

MacrOData: New Benchmarks of Thousands of Datasets for Tabular Outlier Detection

Xueying Ding, Simon Kl\"uttermann, Haomin Wen, Yilong Chen, Leman Akoglu

PDF

1 Repo

TL;DR

MacrOData introduces a comprehensive, large-scale benchmark suite with over 2,400 datasets for evaluating tabular outlier detection methods, addressing limitations of previous benchmarks.

Contribution

The paper presents MacrOData, a new extensive benchmark suite with diverse datasets, standardized splits, and metadata, enabling robust evaluation of outlier detection techniques.

Findings

01

Extensive evaluation of classical, deep, and foundation models across all benchmarks.

02

MacrOData's scale and diversity improve the robustness of outlier detection evaluation.

03

Public leaderboard and open datasets facilitate future research and benchmarking.

Abstract

Quality benchmarks are essential for fairly and accurately tracking scientific progress and enabling practitioners to make informed methodological choices. Outlier detection (OD) on tabular data underpins numerous real-world applications, yet existing OD benchmarks remain limited. The prominent OD benchmark AdBench is the de facto standard in the literature, yet comprises only 57 datasets. In addition to other shortcomings discussed in this work, its small scale severely restricts diversity and statistical power. We introduce MacrOData, a large-scale benchmark suite for tabular OD comprising three carefully curated components: OddBench, with 790 datasets containing real-world semantic anomalies; OvrBench, with 856 datasets featuring real-world statistical outliers; and SynBench, with 800 synthetically generated datasets spanning diverse data priors and outlier archetypes. Owing to its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://huggingface.co/MacrOData-CMU
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.