Video based Contextual Question Answering
Akash Ganesan, Divyansh Pal, Karthik Muthuraman, Shubham Dash

TL;DR
This paper introduces a novel video-based contextual question-answering model that uses a graphical representation to handle diverse queries across entire videos, including spatial and temporal relationships.
Contribution
It extends image-based question-answering techniques to videos by proposing a graphical model capable of understanding complex spatial and temporal queries.
Findings
Developed a graphical representation for videos.
Able to answer spatial and temporal queries.
Generalizes image QA to video content.
Abstract
The primary aim of this project is to build a contextual Question-Answering model for videos. The current methodologies provide a robust model for image based Question-Answering, but we are aim to generalize this approach to be videos. We propose a graphical representation of video which is able to handle several types of queries across the whole video. For example, if a frame has an image of a man and a cat sitting, it should be able to handle queries like, where is the cat sitting with respect to the man? or ,what is the man holding in his hand?. It should be able to answer queries relating to temporal relationships also.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Video Analysis and Summarization
