Loading paper
CAST: Modeling Visual State Transitions for Consistent Video Retrieval | Tomesphere