Model-specific Data Subsampling with Influence Functions

Anant Raj; Cameron Musco; Lester Mackey; Nicolo Fusi

arXiv:2010.10218·cs.LG·October 21, 2020

Model-specific Data Subsampling with Influence Functions

Anant Raj, Cameron Musco, Lester Mackey, Nicolo Fusi

PDF

Open Access

TL;DR

This paper introduces a data subsampling method based on influence functions to efficiently select high-quality models, reducing computational costs in large-scale, expensive model evaluations.

Contribution

The authors propose a novel influence function-based subsampling strategy that outperforms random sampling for model selection tasks.

Findings

01

The method efficiently identifies high-performing models with fewer data points.

02

The approach reduces computational costs compared to traditional evaluation methods.

03

Empirical results demonstrate faster model selection without sacrificing accuracy.

Abstract

Model selection requires repeatedly evaluating models on a given dataset and measuring their relative performances. In modern applications of machine learning, the models being considered are increasingly more expensive to evaluate and the datasets of interest are increasing in size. As a result, the process of model selection is time-consuming and computationally inefficient. In this work, we develop a model-specific data subsampling strategy that improves over random sampling whenever training points have varying influence. Specifically, we leverage influence functions to guide our selection strategy, proving theoretically, and demonstrating empirically that our approach quickly selects high-quality models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Advanced Statistical Process Monitoring