Loading paper
ClawMachine: Learning to Fetch Visual Tokens for Referential Comprehension | Tomesphere