Loading paper
VFA: Relieving Vector Operations in Flash Attention with Global Maximum Pre-computation | Tomesphere