Towards a Benchmark for Scientific Understanding in Humans and Machines
Kristian Gonzalez Barman, Sascha Caron, Tom Claassen, Henk de Regt

TL;DR
This paper proposes a new benchmark, SUB, to measure and compare scientific understanding in humans and AI, using behavioral tasks inspired by philosophy of science.
Contribution
It introduces a framework and set of tests for evaluating scientific understanding, bridging human and machine capabilities.
Findings
SUB enables comparison of scientific understanding levels
Benchmarking improves trust and quality in AI understanding
Aligning human and machine understanding advances scientific discovery
Abstract
Scientific understanding is a fundamental goal of science, allowing us to explain the world. There is currently no good way to measure the scientific understanding of agents, whether these be humans or Artificial Intelligence systems. Without a clear benchmark, it is challenging to evaluate and compare different levels of and approaches to scientific understanding. In this Roadmap, we propose a framework to create a benchmark for scientific understanding, utilizing tools from philosophy of science. We adopt a behavioral notion according to which genuine understanding should be recognized as an ability to perform certain tasks. We extend this notion by considering a set of questions that can gauge different levels of scientific understanding, covering information retrieval, the capability to arrange information to produce an explanation, and the ability to infer how things would be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning
