From Query to Explanation: Uni-RAG for Multi-Modal Retrieval-Augmented Learning in STEM
Xinyi Wu, Yanhao Jia, Luwei Xiao, Shuai Zhao, Fengkuang Chiang, Erik Cambria

TL;DR
This paper introduces Uni-RAG, a multi-modal retrieval-augmented system that improves educational content retrieval and explanation generation in STEM, addressing diversity and ambiguity in queries with a scalable, knowledge-encoded framework.
Contribution
The paper presents Uni-Retrieval and Uni-RAG, novel modules that enhance multi-modal educational content retrieval and generation, incorporating domain-specific knowledge and adaptability to unseen query types.
Findings
Outperforms baseline systems in retrieval accuracy.
Generates higher quality, explainable educational content.
Maintains low computational cost while scaling to diverse queries.
Abstract
In AI-facilitated teaching, leveraging various query styles to interpret abstract educational content is crucial for delivering effective and accessible learning experiences. However, existing retrieval systems predominantly focus on natural text-image matching and lack the capacity to address the diversity and ambiguity inherent in real-world educational scenarios. To address this limitation, we develop a lightweight and efficient multi-modal retrieval module, named Uni-Retrieval, which extracts query-style prototypes and dynamically matches them with tokens from a continually updated Prompt Bank. This Prompt Bank encodes and stores domain-specific knowledge by leveraging a Mixture-of-Expert Low-Rank Adaptation (MoE-LoRA) module and can be adapted to enhance Uni-Retrieval's capability to accommodate unseen query types at test time. To enable natural language educational content…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
