Adapting Visual Question Answering Models for Enhancing Multimodal Community Q&A Platforms
Avikalp Srivastava, Hsin Wen Liu, Sumio Fujita

TL;DR
This paper extends visual question answering models to multimodal community Q&A platforms, improving question categorization and expert retrieval by leveraging image data, and introduces novel attention augmentations for better performance.
Contribution
It is the first to adapt VQA models for multimodal CQA tasks, addressing the challenge of integrating images into community question answering systems.
Findings
Model outperforms text-only baselines in classification and retrieval.
Augmented attention methods improve grounding of visual information.
First application of VQA models to real-world multimodal CQA data.
Abstract
Question categorization and expert retrieval methods have been crucial for information organization and accessibility in community question & answering (CQA) platforms. Research in this area, however, has dealt with only the text modality. With the increasing multimodal nature of web content, we focus on extending these methods for CQA questions accompanied by images. Specifically, we leverage the success of representation learning for text and images in the visual question answering (VQA) domain, and adapt the underlying concept and architecture for automated category classification and expert retrieval on image-based questions posted on Yahoo! Chiebukuro, the Japanese counterpart of Yahoo! Answers. To the best of our knowledge, this is the first work to tackle the multimodality challenge in CQA, and to adapt VQA models for tasks on a more ecologically valid source of visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Expert finding and Q&A systems · Domain Adaptation and Few-Shot Learning
