Loading paper
Rethinking Cross-modal Interaction from a Top-down Perspective for Referring Video Object Segmentation | Tomesphere