Open-Ended Multi-Modal Relational Reasoning for Video Question Answering

Haozheng Luo; Ruiyang Qin; Chenwei Xu; Guo Ye; and Zening Luo

arXiv:2012.00822·cs.AI·June 12, 2024

Open-Ended Multi-Modal Relational Reasoning for Video Question Answering

Haozheng Luo, Ruiyang Qin, Chenwei Xu, Guo Ye, and Zening Luo

PDF

Open Access 1 Repo

TL;DR

This paper presents a robotic agent capable of analyzing video environments and answering questions through integrated video recognition and natural language processing, improving interaction efficiency and trust.

Contribution

It introduces a novel robotic agent that combines multi-modal reasoning for video question answering, enhancing performance and understanding in human-robot interactions.

Findings

01

Positive correlation between trust and interaction efficiency

02

2-3% performance improvement over benchmark methods

03

Effective integration of video recognition and NLP models

Abstract

In this paper, we introduce a robotic agent specifically designed to analyze external environments and address participants' questions. The primary focus of this agent is to assist individuals using language-based interactions within video-based scenes. Our proposed method integrates video recognition technology and natural language processing models within the robotic agent. We investigate the crucial factors affecting human-robot interactions by examining pertinent issues arising between participants and robot agents. Methodologically, our experimental findings reveal a positive relationship between trust and interaction efficiency. Furthermore, our model demonstrates a 2\% to 3\% performance enhancement in comparison to other benchmark methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

robinzixuan/Video-Question-Answering-HRI
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Topic Modeling