Loading paper
Greedy Output Approximation: Towards Efficient Structured Pruning for LLMs Without Retraining | Tomesphere