Loading paper
Backward-Friendly Optimization: Training Large Language Models with Approximate Gradients under Memory Constraints | Tomesphere