Loading paper
Treat Visual Tokens as Text? But Your MLLM Only Needs Fewer Efforts to See | Tomesphere