Motion-Appearance Co-Memory Networks for Video Question Answering
Jiyang Gao, Runzhou Ge, Kan Chen, Ram Nevatia

TL;DR
This paper introduces a novel co-memory network for video question answering that leverages motion and appearance cues, multi-level contextual facts, and dynamic temporal representations, significantly improving performance on TGIF-QA.
Contribution
It proposes a co-memory attention mechanism, a temporal conv-deconv network for contextual facts, and a dynamic fact ensemble method, advancing video QA techniques.
Findings
Outperforms state-of-the-art on TGIF-QA dataset
Effective utilization of motion and appearance cues
Improved handling of varying question requirements
Abstract
Video Question Answering (QA) is an important task in understanding video temporal structure. We observe that there are three unique attributes of video QA compared with image QA: (1) it deals with long sequences of images containing richer information not only in quantity but also in variety; (2) motion and appearance information are usually correlated with each other and able to provide useful attention cues to the other; (3) different questions require different number of frames to infer the answer. Based these observations, we propose a motion-appearance comemory network for video QA. Our networks are built on concepts from Dynamic Memory Network (DMN) and introduces new mechanisms for video QA. Specifically, there are three salient aspects: (1) a co-memory attention mechanism that utilizes cues from both motion and appearance to generate attention; (2) a temporal conv-deconv…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
MethodsSoftmax · Gated Recurrent Unit · Dynamic Memory Network · Memory Network
