Memory-Efficient Training with In-Place FFT Implementation
Xinyu Ding, Bangtian Liu, Siyu Liao, Zhongfeng Wang

TL;DR
This paper introduces a novel in-place real FFT framework that reduces memory usage in deep learning training by leveraging symmetry properties, enabling more efficient frequency-domain computations.
Contribution
The paper presents the first fully in-place real FFT framework (rdFFT) that maintains input-output memory consistency and eliminates intermediate cache usage.
Findings
Reduces training memory cost in NLP tasks
Maintains input-output memory space consistency
Eliminates intermediate cache usage
Abstract
Fast Fourier Transforms (FFT) are widely used to reduce memory and computational costs in deep learning. However, existing implementations, including standard FFT and real FFT (rFFT), cannot achieve true in-place computation. In particular, rFFT maps an input of size n to a complex output of size n/2+1, causing dimensional mismatch and requiring additional memory allocation. We propose the first real-domain, fully in-place FFT framework (rdFFT) that preserves input-output memory space consistency. By leveraging butterfly operation symmetry and conjugate properties in the frequency domain, we design an implicit complex encoding scheme that eliminates intermediate cache usage entirely. Experiments on multiple natural language understanding tasks demonstrate the method effectiveness in reducing training memory cost, offering a promising direction for frequency-domain lightweight adaptation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Natural Language Processing Techniques · Speech Recognition and Synthesis
