Loading paper
Acceleration Multiple Heads Decoding for LLM via Dynamic Tree Attention | Tomesphere