LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering
Jingjing Jiang, Ziyi Liu, and Nanning Zheng

TL;DR
LiVLR is a lightweight, flexible visual-linguistic reasoning framework for VideoQA that effectively integrates multi-modal content at different semantic levels, achieving superior performance on benchmark datasets.
Contribution
The paper introduces LiVLR, a novel lightweight VideoQA framework with a diversity-aware reasoning module for flexible multi-modal content integration.
Findings
Outperforms existing methods on MRSVTT-QA and KnowIT VQA datasets.
Effective multi-grained visual and linguistic representations.
Key components validated through extensive ablation studies.
Abstract
Video Question Answering (VideoQA), aiming to correctly answer the given question based on understanding multi-modal video content, is challenging due to the rich video content. From the perspective of video understanding, a good VideoQA framework needs to understand the video content at different semantic levels and flexibly integrate the diverse video content to distill question-related content. To this end, we propose a Lightweight Visual-Linguistic Reasoning framework named LiVLR. Specifically, LiVLR first utilizes the graph-based Visual and Linguistic Encoders to obtain multi-grained visual and linguistic representations. Subsequently, the obtained representations are integrated with the devised Diversity-aware Visual-Linguistic Reasoning module (DaVL). The DaVL considers the difference between the different types of representations and can flexibly adjust the importance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition
