Loading paper
Direct Visual Grounding by Directing Attention of Visual Tokens | Tomesphere