Learning long range dependencies through time reversal symmetry breaking
Guillaume Pourcel, Maxence Ernoult

TL;DR
This paper introduces RHEL, a physics-inspired algorithm for training Hamiltonian-based state space models that efficiently handles long-range dependencies in sequence data, matching BPTT performance with fewer passes.
Contribution
The paper presents RHEL, a novel gradient computation method for Hamiltonian systems that is scalable, efficient, and physically grounded, enabling long-range sequence modeling.
Findings
RHEL requires only three forward passes regardless of model size.
RHEL matches BPTT performance on various tasks.
Scalable to hierarchies of Hamiltonian recurrent units.
Abstract
Deep State Space Models (SSMs) reignite physics-grounded compute paradigms, as RNNs could natively be embodied into dynamical systems. This calls for dedicated learning algorithms obeying to core physical principles, with efficient techniques to simulate these systems and guide their design. We propose Recurrent Hamiltonian Echo Learning (RHEL), an algorithm which provably computes loss gradients as finite differences of physical trajectories of non-dissipative, Hamiltonian systems. In ML terms, RHEL only requires three "forward passes" irrespective of model size, without explicit Jacobian computation, nor incurring any variance in the gradient estimation. Motivated by the physical realization of our algorithm, we first introduce RHEL in continuous time and demonstrate its formal equivalence with the continuous adjoint state method. To facilitate the simulation of Hamiltonian systems…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsQuantum many-body systems · Model Reduction and Neural Networks · Neural Networks and Reservoir Computing
MethodsSelf-Learning
