Foundational Question Generation for Video Question Answering via an Embedding-Integrated Approach

Ju-Young Oh

arXiv:2511.17618·cs.CV·November 25, 2025

Foundational Question Generation for Video Question Answering via an Embedding-Integrated Approach

Ju-Young Oh

PDF

Open Access

TL;DR

This paper introduces FIQ, a novel framework that generates foundational Q&A pairs from videos to improve reasoning and generalization in video question answering models, achieving state-of-the-art results.

Contribution

The paper presents a new embedding-integrated approach for generating scene-level Q&A pairs and a VQ-CAlign module for better alignment of question embeddings with visual features.

Findings

01

FIQ outperforms baseline models on SUTD-TrafficQA dataset.

02

Generated Q&A pairs enrich scene understanding and reasoning.

03

VQ-CAlign improves task-specific embedding alignment.

Abstract

Conventional VQA approaches primarily rely on question-answer (Q&A) pairs to learn the spatio-temporal dynamics of video content. However, most existing annotations are event-centric, which restricts the model's ability to capture the comprehensive context of a scene. The lack of fundamental information such as object categories, spatial configurations, and descriptive visual attributes prevents the model from forming a complete understanding of the environment, ultimately limiting its generalization and reasoning capability. In this paper, we introduce Foundational Question Generation for Video Question Answering via an Embedding-Integrated Approach (FIQ), a framework designed to enhance the reasoning capability of VQA models by improving their foundational comprehension of video content. FIQ generates Q&A pairs from descriptive information extracted directly from videos, thereby…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning