Loading paper
SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion | Tomesphere