Loading paper
v1: Learning to Point Visual Tokens for Multimodal Grounded Reasoning | Tomesphere