Towards a Benchmark for Scientific Understanding in Humans and Machines

Kristian Gonzalez Barman; Sascha Caron; Tom Claassen; Henk de Regt

arXiv:2304.10327·cs.AI·May 7, 2024·1 cites

Towards a Benchmark for Scientific Understanding in Humans and Machines

Kristian Gonzalez Barman, Sascha Caron, Tom Claassen, Henk de Regt

PDF

Open Access

TL;DR

This paper proposes a new benchmark, SUB, to measure and compare scientific understanding in humans and AI, using behavioral tasks inspired by philosophy of science.

Contribution

It introduces a framework and set of tests for evaluating scientific understanding, bridging human and machine capabilities.

Findings

01

SUB enables comparison of scientific understanding levels

02

Benchmarking improves trust and quality in AI understanding

03

Aligning human and machine understanding advances scientific discovery

Abstract

Scientific understanding is a fundamental goal of science, allowing us to explain the world. There is currently no good way to measure the scientific understanding of agents, whether these be humans or Artificial Intelligence systems. Without a clear benchmark, it is challenging to evaluate and compare different levels of and approaches to scientific understanding. In this Roadmap, we propose a framework to create a benchmark for scientific understanding, utilizing tools from philosophy of science. We adopt a behavioral notion according to which genuine understanding should be recognized as an ability to perform certain tasks. We extend this notion by considering a set of questions that can gauge different levels of scientific understanding, covering information retrieval, the capability to arrange information to produce an explanation, and the ability to infer how things would be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning