Scaling Up Influence Functions

Andrea Schioppa; Polina Zablotskaia; David Vilar; Artem Sokolov

arXiv:2112.03052·cs.LG·December 7, 2021

Scaling Up Influence Functions

Andrea Schioppa, Polina Zablotskaia, David Vilar, Artem Sokolov

PDF

2 Repos 1 Video

TL;DR

This paper introduces a scalable method for computing influence functions in large Transformer models, enabling analysis of training data impact on predictions for models with hundreds of millions of parameters.

Contribution

It presents a novel approach using Arnoldi iteration to efficiently compute inverse Hessians, allowing influence functions to scale to full-size language and vision Transformer models.

Findings

01

Successfully scaled influence functions to models with hundreds of millions of parameters.

02

Demonstrated effectiveness on image classification and sequence-to-sequence tasks.

03

Provided open-source code for implementation.

Abstract

We address efficient calculation of influence functions for tracking predictions back to the training data. We propose and analyze a new approach to speeding up the inverse Hessian calculation based on Arnoldi iteration. With this improvement, we achieve, to the best of our knowledge, the first successful implementation of influence functions that scales to full-size (language and vision) Transformer models with several hundreds of millions of parameters. We evaluate our approach on image classification and sequence-to-sequence tasks with tens to a hundred of millions of training examples. Our code will be available at https://github.com/google-research/jax-influence.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Scaling Up Influence Functions· underline

Taxonomy

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Softmax · Residual Connection · Adam · Dropout · Position-Wise Feed-Forward Layer · Layer Normalization · Dense Connections