# Analyzing Data Selection Techniques with Tools from the Theory of   Information Losses

**Authors:** Brandon Foggo, Nanpeng Yu

arXiv: 1902.09602 · 2022-01-21

## TL;DR

This paper introduces new information theoretic tools to analyze data selection methods, demonstrating how Facility Location Selection and Transductive Experimental Design reduce information loss, thereby enhancing their interpretability and scope.

## Contribution

The paper develops a rigorous information theoretic framework for analyzing data selection techniques, applying it to improve understanding of Facility Location Selection and Transductive Experimental Design.

## Key findings

- Facility Location Selection reduces information loss.
- Transductive Experimental Design decreases information loss and its scope is expanded.
- The framework enhances interpretability of data selection methods.

## Abstract

In this paper, we present and illustrate some new tools for rigorously analyzing training data selection methods. These tools focus on the information theoretic losses that occur when sampling data. We use this framework to prove that two methods, Facility Location Selection and Transductive Experimental Design, reduce these losses. These are meant to act as generalizable theoretical examples of applying the field of Information Theoretic Deep Learning Theory to the fields of data selection and active learning. Both analyses yield insight into their respective methods and increase their interpretability. In the case of Transductive Experimental Design, the provided analysis greatly increases the method's scope as well.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1902.09602/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1902.09602/full.md

## References

38 references — full list in the complete paper: https://tomesphere.com/paper/1902.09602/full.md

---
Source: https://tomesphere.com/paper/1902.09602