Loading paper
Beyond First-Order: Training LLMs with Stochastic Conjugate Subgradients and AdamW | Tomesphere