Laplace Sample Information: Data Informativeness Through a Bayesian Lens

Johannes Kaiser; Kristian Schwethelm; Daniel Rueckert; Georgios Kaissis

arXiv:2505.15303·cs.LG·May 22, 2025

Laplace Sample Information: Data Informativeness Through a Bayesian Lens

Johannes Kaiser, Kristian Schwethelm, Daniel Rueckert, Georgios Kaissis

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Laplace Sample Information (LSI), a Bayesian-based measure to evaluate individual sample informativeness, aiding data selection and improving model training efficiency across various settings.

Contribution

The paper presents LSI, a novel, model-agnostic informativeness measure based on Bayesian approximation and information theory, applicable to diverse data types and learning scenarios.

Findings

01

LSI effectively ranks data by typicality and detects mislabeled samples.

02

LSI measures class-wise informativeness and dataset difficulty accurately.

03

LSI transfers efficiently to large model training.

Abstract

Accurately estimating the informativeness of individual samples in a dataset is an important objective in deep learning, as it can guide sample selection, which can improve model efficiency and accuracy by removing redundant or potentially harmful samples. We propose Laplace Sample Information (LSI) measure of sample informativeness grounded in information theory widely applicable across model architectures and learning settings. LSI leverages a Bayesian approximation to the weight posterior and the KL divergence to measure the change in the parameter distribution induced by a sample of interest from the dataset. We experimentally show that LSI is effective in ordering the data with respect to typicality, detecting mislabeled samples, measuring class-wise informativeness, and assessing dataset difficulty. We demonstrate these capabilities of LSI on image and text data in supervised and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TUM-AIMED/LSI
jaxOfficial

Videos

Laplace Sample Information: Data Informativeness Through a Bayesian Lens· slideslive

Taxonomy

TopicsStatistics Education and Methodologies · Forecasting Techniques and Applications