Plastic Learning with Deep Fourier Features

Alex Lewandowski; Dale Schuurmans; Marlos C. Machado

arXiv:2410.20634·cs.LG·October 29, 2024

Plastic Learning with Deep Fourier Features

Alex Lewandowski, Dale Schuurmans, Marlos C. Machado

PDF

Open Access 1 Video 3 Reviews

TL;DR

This paper introduces deep Fourier features to enhance neural network plasticity, enabling better continual learning by balancing linearity and nonlinearity, with theoretical backing and extensive empirical validation.

Contribution

It proposes deep Fourier features as a novel activation method that maintains trainability and effectiveness in continual learning scenarios, supported by theoretical and experimental evidence.

Findings

01

Deep Fourier features improve continual learning performance.

02

Replacing ReLU with Fourier features enhances trainability.

03

Results are consistent across multiple datasets and scenarios.

Abstract

Deep neural networks can struggle to learn continually in the face of non-stationarity. This phenomenon is known as loss of plasticity. In this paper, we identify underlying principles that lead to plastic algorithms. In particular, we provide theoretical results showing that linear function approximation, as well as a special case of deep linear networks, do not suffer from loss of plasticity. We then propose deep Fourier features, which are the concatenation of a sine and cosine in every layer, and we show that this combination provides a dynamic balance between the trainability obtained through linearity and the effectiveness obtained through the nonlinearity of neural networks. Deep networks composed entirely of deep Fourier features are highly trainable and sustain their trainability over the course of learning. Our empirical results show that continual learning performance can be…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 3

Strengths

The proposed activation funtion may replace the ReLU and serve as the comman practice when building continuous learning model.

Weaknesses

1. While the authors establish that linear networks can more readily adapt to new data (Sec 3.1), they do not examine whether these networks are resistant to catastrophic forgetting. Without measuring anti-forgetting properties, it’s unclear if the proposed approach truly meets both requirements of continuous learning (adaptability and retention). 2. The proposed activation function, $Fourier(z)=[sin(z),cos(z)]$, lacks clear implementation details, which may hinder practical use and replication

Reviewer 02Rating 8Confidence 4

Strengths

The paper is well written. A discussion and analysis of an important problem. Showing that linear network don't suffer plasticity loss and a mix between linear and non linear can retrain trainability. Theoretical evidence and empirical evidence support the paper contribution.

Weaknesses

While I enjoyed reading the paper and its flow, the presentation can still be improved substantially especially in the figures and equations. For example, equations are not numbered which makes it hard to refer back to specific equation. Figures are not well explained, and colours are really hard to distinguish among different baselines. While a new activation function is introduced, a proper analysis of its behaviour and the performance of the network should be demonstrate here we only see r

Reviewer 03Rating 5Confidence 4

Strengths

+ The paper presents a novelty method by integrating Fourier Features into deep networks, offering a new view on balancing linearity and nonlinearity to sustain plasticity. + The authors provide a comprehensive theoretical basis for their Deep Fourier Feature, proving that linear function approximations and certain linear networks maintain plasticity. + The experiments were conducted on benchmark datasets under various continual learning scenarios.

Weaknesses

+ Although they compare the proposed method with ReLU-based architectures, it could include comparisons with other advanced techniques in continual learning to validate the advantages further. + There is limited discussion on the computational and memory costs associated with the model compared to standard activation functions. + The authors claim that the proposed method focuses on the loss of plasticity, however, there seems to be something confusing about the experiments on label noise and pi

Videos

Plastic Learning with Deep Fourier Features· slideslive

Taxonomy

TopicsImage Processing and 3D Reconstruction

Methods*Communicated@Fast*How Do I Communicate to Expedia?