Loading paper
VGent: Visual Grounding via Modular Design for Disentangling Reasoning and Prediction | Tomesphere