Actions and Objects Pathways for Domain Adaptation in Video Question Answering
Safaa Abdullahi Moallim Mohamud, Ho-Young Jung

TL;DR
This paper presents AOPath, a novel approach for out-of-domain video question answering that leverages pretrained features and dissociates action and object pathways to improve generalization without extensive training.
Contribution
AOPath introduces a domain-agnostic feature conversion module and separate reasoning pathways inspired by the human brain, enhancing out-of-domain generalization in video QA.
Findings
Achieves 5% and 4% performance improvements on out-of-domain and in-domain datasets.
Outperforms prior methods with fewer trainable parameters.
Validated on TVQA dataset with multiple genre-based subsets.
Abstract
In this paper, we introduce the Actions and Objects Pathways (AOPath) for out-of-domain generalization in video question answering tasks. AOPath leverages features from a large pretrained model to enhance generalizability without the need for explicit training on the unseen domains. Inspired by human brain, AOPath dissociates the pretrained features into action and object features, and subsequently processes them through separate reasoning pathways. It utilizes a novel module which converts out-of-domain features into domain-agnostic features without introducing any trainable weights. We validate the proposed approach on the TVQA dataset, which is partitioned into multiple subsets based on genre to facilitate the assessment of generalizability. The proposed approach demonstrates 5% and 4% superior performance over conventional classifiers on out-of-domain and in-domain datasets,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
