Loading paper
FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Vision Language Models | Tomesphere