Video based Contextual Question Answering

Akash Ganesan; Divyansh Pal; Karthik Muthuraman; Shubham Dash

arXiv:1804.07399·cs.CL·April 23, 2018·1 cites

Video based Contextual Question Answering

Akash Ganesan, Divyansh Pal, Karthik Muthuraman, Shubham Dash

PDF

Open Access

TL;DR

This paper introduces a novel video-based contextual question-answering model that uses a graphical representation to handle diverse queries across entire videos, including spatial and temporal relationships.

Contribution

It extends image-based question-answering techniques to videos by proposing a graphical model capable of understanding complex spatial and temporal queries.

Findings

01

Developed a graphical representation for videos.

02

Able to answer spatial and temporal queries.

03

Generalizes image QA to video content.

Abstract

The primary aim of this project is to build a contextual Question-Answering model for videos. The current methodologies provide a robust model for image based Question-Answering, but we are aim to generalize this approach to be videos. We propose a graphical representation of video which is able to handle several types of queries across the whole video. For example, if a frame has an image of a man and a cat sitting, it should be able to handle queries like, where is the cat sitting with respect to the man? or ,what is the man holding in his hand?. It should be able to answer queries relating to temporal relationships also.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Video Analysis and Summarization