Beyond Factual QA: Mentorship-Oriented Question Answering over Long-Form Multilingual Content

Parth Bhalerao; Diola Dsouza; Ruiwen Guan; Oana Ignat

arXiv:2601.17173·cs.CL·January 27, 2026

Beyond Factual QA: Mentorship-Oriented Question Answering over Long-Form Multilingual Content

Parth Bhalerao, Diola Dsouza, Ruiwen Guan, Oana Ignat

PDF

Open Access 1 Datasets

TL;DR

This paper introduces MentorQA, a multilingual dataset and evaluation framework for mentorship-oriented question answering over long-form videos, emphasizing qualities like clarity and learning value beyond factual correctness.

Contribution

It presents the first multilingual dataset and evaluation framework for mentorship-focused QA from long-form videos, and compares different QA architectures in this context.

Findings

01

Multi-Agent pipelines outperform others in mentorship response quality.

02

Complex topics and low-resource languages benefit most from Multi-Agent approaches.

03

Automated LLM-based evaluation shows significant variation compared to human judgments.

Abstract

Question answering systems are typically evaluated on factual correctness, yet many real-world applications-such as education and career guidance-require mentorship: responses that provide reflection and guidance. Existing QA benchmarks rarely capture this distinction, particularly in multilingual and long-form settings. We introduce MentorQA, the first multilingual dataset and evaluation framework for mentorship-focused question answering from long-form videos, comprising nearly 9,000 QA pairs from 180 hours of content across four languages. We define mentorship-focused evaluation dimensions that go beyond factual accuracy, capturing clarity, alignment, and learning value. Using MentorQA, we compare Single-Agent, Dual-Agent, RAG, and Multi-Agent QA architectures under controlled conditions. Multi-Agent pipelines consistently produce higher-quality mentorship responses, with especially…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

AIM-SCU/MentorQA
dataset· 40 dl
40 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Intelligent Tutoring Systems and Adaptive Learning