Loading paper
Hierarchical Kernel Transformer: Multi-Scale Attention with an Information-Theoretic Approximation Analysis | Tomesphere