Interpolating Video-LLMs: Toward Longer-sequence LMMs in a Training-free   Manner

Yuzhang Shang; Bingxin Xu; Weitai Kang; Mu Cai; Yuheng Li; Zehao Wen,; Zhen Dong; Kurt Keutzer; Yong Jae Lee; Yan Yan

arXiv:2409.12963·cs.CV·October 3, 2024

Interpolating Video-LLMs: Toward Longer-sequence LMMs in a Training-free Manner

Yuzhang Shang, Bingxin Xu, Weitai Kang, Mu Cai, Yuheng Li, Zehao Wen,, Zhen Dong, Kurt Keutzer, Yong Jae Lee, Yan Yan

PDF

Open Access

TL;DR

This paper proposes a training-free interpolation method for Video-LLMs, enabling longer video sequence processing by rearranging video tokens and extending the LLM context window without additional training.

Contribution

It introduces a novel training-free interpolation technique that overcomes fixed encoder and limited context length constraints in Video-LLMs.

Findings

01

Enables processing of longer videos without retraining

02

Rearranges video tokens to bypass fixed encoder limitations

03

Extends LLM context window training-free

Abstract

Advancements in Large Language Models (LLMs) inspire various strategies for integrating video modalities. A key approach is Video-LLMs, which incorporate an optimizable interface linking sophisticated video encoders to LLMs. However, due to computation and data limitations, these Video-LLMs are typically pre-trained to process only short videos, limiting their broader application for understanding longer video content. Additionally, fine-tuning Video-LLMs to handle longer videos is cost-prohibitive. Consequently, it becomes essential to explore the interpolation of Video-LLMs under a completely training-free setting. In this paper, we first identify the primary challenges in interpolating Video-LLMs: (1) the video encoder and modality alignment projector are fixed, preventing the integration of additional frames into Video-LLMs, and (2) the LLM backbone is limited in its content length…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoft Robotics and Applications · Iterative Learning Control Systems · Muscle activation and electromyography studies