Emerging categories in scientific explanations

Giacomo Magnifico; Eduard Barbu

arXiv:2505.17832·cs.CL·May 26, 2025

Emerging categories in scientific explanations

Giacomo Magnifico, Eduard Barbu

PDF

TL;DR

This paper introduces a new dataset of human-like scientific explanations from biomedical literature, categorizing them into emerging explanation types to support AI understanding and generation of explanations.

Contribution

It provides a large-scale, annotated dataset of scientific explanations with multi-class categories, addressing the lack of datasets for human-like explanations in AI research.

Findings

01

Achieved a Krippendorf Alpha of 0.667 for 3-class annotation

02

Extracted explanation sentences from biomedical literature

03

Organized explanations into multi-class categories

Abstract

Clear and effective explanations are essential for human understanding and knowledge dissemination. The scope of scientific research aiming to understand the essence of explanations has recently expanded from the social sciences to machine learning and artificial intelligence. Explanations for machine learning decisions must be impactful and human-like, and there is a lack of large-scale datasets focusing on human-like and human-generated explanations. This work aims to provide such a dataset by: extracting sentences that indicate explanations from scientific literature among various sources in the biotechnology and biophysics topic domains (e.g. PubMed's PMC Open Access subset); providing a multi-class notation derived inductively from the data; evaluating annotator consensus on the emerging categories. The sentences are organized in an openly-available dataset, with two different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.