Speeding Up MACE: Low-Precision Tricks for Equivarient Force Fields
Alexandre Benoit

TL;DR
This paper demonstrates that low-precision arithmetic and GPU-optimized kernels can significantly accelerate equivariant force field computations like MACE without compromising accuracy, by profiling bottlenecks and evaluating mixed-precision policies.
Contribution
It systematically evaluates low-precision execution policies and GPU kernels for MACE, providing practical guidelines for faster, cost-effective molecular dynamics simulations.
Findings
cuEquivariance reduces inference latency by about 3x.
Casting linear layers to BF16/FP16 yields roughly 4x speedup.
Energy and thermodynamic observables remain stable with mixed-precision inference.
Abstract
Machine-learning force fields can deliver accurate molecular dynamics (MD) at high computational cost. For SO(3)-equivariant models such as MACE, there is little systematic evidence on whether reduced-precision arithmetic and GPU-optimized kernels can cut this cost without harming physical fidelity. This thesis aims to make MACE cheaper and faster while preserving accuracy by identifying computational bottlenecks and evaluating low-precision execution policies. We profile MACE end-to-end and per block, compare the e3nn and NVIDIA cuEquivariance backends, and assess FP64/FP32/BF16/FP16 settings (with FP32 accumulation) for inference, short NVT and long NPT water simulations, and toy training runs under reproducible, steady-state timing. cuEquivariance reduces inference latency by about . Casting only linear layers to BF16/FP16 within an FP32 model yields roughly 4x additional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
