Loading paper
Tangram: Accelerating Serverless LLM Loading through GPU Memory Reuse and Affinity | Tomesphere