Beyond Retrieval: Joint Supervision and Multimodal Document Ranking for Textbook Question Answering
Hessa Alawwad, Usman Naseem, Areej Alhothali, Ali Alkhathlan, and Amani Jamal

TL;DR
This paper introduces JETRTQA, a multimodal learning framework that enhances document retrieval relevance in textbook question answering by joint training with ranking supervision, leading to improved accuracy.
Contribution
The paper presents a novel joint training approach with ranking supervision for multimodal document retrieval in TQA, outperforming previous methods.
Findings
Achieves 2.4% accuracy gain on validation set
Achieves 11.1% accuracy gain on test set
Improves discrimination between relevant and irrelevant documents
Abstract
Textbook question answering (TQA) is a complex task, requiring the interpretation of complex multimodal context. Although recent advances have improved overall performance, they often encounter difficulties in educational settings where accurate semantic alignment and task-specific document retrieval are essential. In this paper, we propose a novel approach to multimodal textbook question answering by introducing a mechanism for enhancing semantic representations through multi-objective joint training. Our model, Joint Embedding Training With Ranking Supervision for Textbook Question Answering (JETRTQA), is a multimodal learning framework built on a retriever--generator architecture that uses a retrieval-augmented generation setup, in which a multimodal large language model generates answers. JETRTQA is designed to improve the relevance of retrieved documents in complex educational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Educational Assessment and Pedagogy · Reflective Practices in Education
MethodsSparse Evolutionary Training
