TrAct: Making First-layer Pre-Activations Trainable

Felix Petersen; Christian Borgelt; Stefano Ermon

arXiv:2410.23970·cs.LG·November 1, 2024

TrAct: Making First-layer Pre-Activations Trainable

Felix Petersen, Christian Borgelt, Stefano Ermon

PDF

Open Access 1 Video

TL;DR

This paper introduces TrAct, a method that makes first-layer pre-activations trainable by performing gradient descent on activations, leading to faster training of vision models with minimal overhead.

Contribution

The paper proposes a novel approach to optimize first-layer activations directly, providing a closed-form solution and demonstrating significant training speedups across various vision models.

Findings

01

Training speed increased by 1.25x to 4x

02

Applicable to convolutional and transformer models

03

Minimal computational overhead

Abstract

We consider the training of the first layer of vision models and notice the clear relationship between pixel values and gradient update magnitudes: the gradients arriving at the weights of a first layer are by definition directly proportional to (normalized) input pixel values. Thus, an image with low contrast has a smaller impact on learning than an image with higher contrast, and a very bright or very dark image has a stronger impact on the weights than an image with moderate brightness. In this work, we propose performing gradient descent on the embeddings produced by the first layer of the model. However, switching to discrete inputs with an embedding layer is not a reasonable option for vision models. Thus, we propose the conceptual procedure of (i) a gradient descent step on first layer activations to construct an activation proposal, and (ii) finding the optimal weights of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

TrAct: Making First-layer Pre-Activations Trainable· slideslive

Taxonomy

TopicsScientific Computing and Data Management