Loading paper
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs | Tomesphere