Loading paper
Multi-Layer Softmaxing during Training Neural Machine Translation for Flexible Decoding with Fewer Layers | Tomesphere