Memory-Efficient Training with In-Place FFT Implementation

Xinyu Ding; Bangtian Liu; Siyu Liao; Zhongfeng Wang

arXiv:2511.01385·cs.LG·December 23, 2025

Memory-Efficient Training with In-Place FFT Implementation

Xinyu Ding, Bangtian Liu, Siyu Liao, Zhongfeng Wang

PDF

Open Access

TL;DR

This paper introduces a novel in-place real FFT framework that reduces memory usage in deep learning training by leveraging symmetry properties, enabling more efficient frequency-domain computations.

Contribution

The paper presents the first fully in-place real FFT framework (rdFFT) that maintains input-output memory consistency and eliminates intermediate cache usage.

Findings

01

Reduces training memory cost in NLP tasks

02

Maintains input-output memory space consistency

03

Eliminates intermediate cache usage

Abstract

Fast Fourier Transforms (FFT) are widely used to reduce memory and computational costs in deep learning. However, existing implementations, including standard FFT and real FFT (rFFT), cannot achieve true in-place computation. In particular, rFFT maps an input of size n to a complex output of size n/2+1, causing dimensional mismatch and requiring additional memory allocation. We propose the first real-domain, fully in-place FFT framework (rdFFT) that preserves input-output memory space consistency. By leveraging butterfly operation symmetry and conjugate properties in the frequency domain, we design an implicit complex encoding scheme that eliminates intermediate cache usage entirely. Experiments on multiple natural language understanding tasks demonstrate the method effectiveness in reducing training memory cost, offering a promising direction for frequency-domain lightweight adaptation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Natural Language Processing Techniques · Speech Recognition and Synthesis