Causality can systematically address the monsters under the bench(marks)

Felix Leeb; Zhijing Jin; Bernhard Sch\"olkopf

arXiv:2502.05085·cs.LG·February 10, 2025

Causality can systematically address the monsters under the bench(marks)

Felix Leeb, Zhijing Jin, Bernhard Sch\"olkopf

PDF

Open Access

TL;DR

This paper advocates using causality as a framework to improve the systematic evaluation of machine learning models, addressing biases, failures, and reproducibility issues through explicit causal modeling and analysis.

Contribution

It introduces causal modeling as a tool for systematic evaluation, identifies common causal graph structures, and demonstrates their application through case studies in machine learning.

Findings

01

Causal assumptions clarify model strengths and limitations.

02

Causal graph topologies aid in understanding reasoning abilities.

03

Case studies show causality improves evaluation and inspires new methods.

Abstract

Effective and reliable evaluation is essential for advancing empirical machine learning. However, the increasing accessibility of generalist models and the progress towards ever more complex, high-level tasks make systematic evaluation more challenging. Benchmarks are plagued by various biases, artifacts, or leakage, while models may behave unreliably due to poorly explored failure modes. Haphazard treatments and inconsistent formulations of such "monsters" can contribute to a duplication of efforts, a lack of trust in results, and unsupported inferences. In this position paper, we argue causality offers an ideal framework to systematically address these challenges. By making causal assumptions in an approach explicit, we can faithfully model phenomena, formulate testable hypotheses with explanatory power, and leverage principled tools for analysis. To make causal model design more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Physics and Python Applications