Loading paper
ST$^3$: Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming | Tomesphere