Loading paper
ELAS: Efficient Pre-Training of Low-Rank Large Language Models via 2:4 Activation Sparsity | Tomesphere