Laplace Sample Information: Data Informativeness Through a Bayesian Lens
Johannes Kaiser, Kristian Schwethelm, Daniel Rueckert, Georgios Kaissis

TL;DR
This paper introduces Laplace Sample Information (LSI), a Bayesian-based measure to evaluate individual sample informativeness, aiding data selection and improving model training efficiency across various settings.
Contribution
The paper presents LSI, a novel, model-agnostic informativeness measure based on Bayesian approximation and information theory, applicable to diverse data types and learning scenarios.
Findings
LSI effectively ranks data by typicality and detects mislabeled samples.
LSI measures class-wise informativeness and dataset difficulty accurately.
LSI transfers efficiently to large model training.
Abstract
Accurately estimating the informativeness of individual samples in a dataset is an important objective in deep learning, as it can guide sample selection, which can improve model efficiency and accuracy by removing redundant or potentially harmful samples. We propose Laplace Sample Information (LSI) measure of sample informativeness grounded in information theory widely applicable across model architectures and learning settings. LSI leverages a Bayesian approximation to the weight posterior and the KL divergence to measure the change in the parameter distribution induced by a sample of interest from the dataset. We experimentally show that LSI is effective in ordering the data with respect to typicality, detecting mislabeled samples, measuring class-wise informativeness, and assessing dataset difficulty. We demonstrate these capabilities of LSI on image and text data in supervised and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsStatistics Education and Methodologies · Forecasting Techniques and Applications
