Model-specific Data Subsampling with Influence Functions
Anant Raj, Cameron Musco, Lester Mackey, Nicolo Fusi

TL;DR
This paper introduces a data subsampling method based on influence functions to efficiently select high-quality models, reducing computational costs in large-scale, expensive model evaluations.
Contribution
The authors propose a novel influence function-based subsampling strategy that outperforms random sampling for model selection tasks.
Findings
The method efficiently identifies high-performing models with fewer data points.
The approach reduces computational costs compared to traditional evaluation methods.
Empirical results demonstrate faster model selection without sacrificing accuracy.
Abstract
Model selection requires repeatedly evaluating models on a given dataset and measuring their relative performances. In modern applications of machine learning, the models being considered are increasingly more expensive to evaluate and the datasets of interest are increasing in size. As a result, the process of model selection is time-consuming and computationally inefficient. In this work, we develop a model-specific data subsampling strategy that improves over random sampling whenever training points have varying influence. Specifically, we leverage influence functions to guide our selection strategy, proving theoretically, and demonstrating empirically that our approach quickly selects high-quality models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Advanced Statistical Process Monitoring
