Loading paper
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training | Tomesphere