Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA
Jongwoo Park, Kanchana Ranasinghe, Kumara Kahatapitiya, Wonjeong Ryu, Donghyun Kim, Michael S. Ryoo

TL;DR
LVNet introduces a hierarchical keyframe selection method that efficiently identifies the most relevant frames for long-form video question answering, significantly reducing redundancy and improving performance without additional training.
Contribution
The paper presents LVNet, a modular, training-free framework with a novel Hierarchical Keyframe Selector for efficient, question-specific frame selection in long-form video QA.
Findings
Achieves state-of-the-art results on four LVQA datasets.
Reduces the number of frames needed for accurate QA.
Improves efficiency without additional training.
Abstract
Long-form videos that span across wide temporal intervals are highly information redundant and contain multiple distinct events or entities that are often loosely related. Therefore, when performing long-form video question answering (LVQA), all information necessary to generate a correct response can often be contained within a small subset of frames. Recent literature leverage large language models (LLMs) in LVQA benchmarks, achieving exceptional performance, while relying on vision language models (VLMs) to convert all visual content within videos into natural language. Such VLMs often independently caption a large number of frames uniformly sampled from long videos, which is not efficient and can mostly be redundant. Motivated by this inefficiency, we propose LVNet, a modular and training-free framework featuring a novel Hierarchical Keyframe Selector (HKS) that efficiently selects…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsVideo Coding and Compression Technologies · Image and Video Quality Assessment · Advanced Data Compression Techniques
