On Exact Bit-level Reversible Transformers Without Changing Architectures
Guoqiang Zhang, J.P. Lewis, W. B. Kleijn

TL;DR
This paper introduces BDIA-transformer, an exact bit-level reversible transformer that maintains standard architecture, improves training efficiency, and enhances validation accuracy by integrating bidirectional Euler approximation and activation quantization.
Contribution
It presents the first exact bit-level reversible transformer using standard architecture without modifications, combining ODE-based interpretation and stochastic ensemble training.
Findings
Outperforms conventional transformers in image classification and language translation.
Requires less training memory while achieving higher validation accuracy.
Enables reversible inference with unchanged architecture.
Abstract
Various reversible deep neural networks (DNN) models have been proposed to reduce memory consumption in the training process. However, almost all existing reversible DNNs either require special non-standard architectures or are constructed by modifying existing DNN architectures considerably to enable reversibility. In this work we present the BDIA-transformer, which is an exact bit-level reversible transformer that uses an unchanged standard architecture for inference. The basic idea is to first treat each transformer block as the Euler integration approximation for solving an ordinary differential equation (ODE) and then incorporate the technique of bidirectional integration approximation (BDIA) into the neural architecture, together with activation quantization to make it exactly bit-level reversible. In the training process, we let a hyper-parameter in BDIA-transformer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Reservoir Computing · Quantum Computing Algorithms and Architecture · Quantum and electron transport phenomena
MethodsSparse Evolutionary Training · Diffusion
