Loading paper
DynImg: Key Frames with Visual Prompts are Good Representation for Multi-Modal Video Understanding | Tomesphere