Loading paper
Optimizing CUDA Code By Kernel Fusion---Application on BLAS | Tomesphere