Loading paper
S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs | Tomesphere