KGCE: Knowledge-Augmented Dual-Graph Evaluator for Cross-Platform Educational Agent Benchmarking with Multimodal Language Models
Zixian Liu, Sihao Liu, Yuqi Zhao

TL;DR
This paper introduces KGCE, a comprehensive benchmarking platform for educational agents that uses a dual-graph evaluation framework and knowledge base enhancement to improve cross-platform task performance and detailed assessment in educational settings.
Contribution
The paper presents a novel dual-graph evaluation framework combined with knowledge base integration, specifically designed for cross-platform educational agent benchmarking with multimodal language models.
Findings
Constructed a dataset of 104 education-related tasks across platforms
Developed a dual-graph evaluation framework for fine-grained assessment
Enhanced agent system with a knowledge base for private-domain software
Abstract
With the rapid adoption of multimodal large language models (MLMs) in autonomous agents, cross-platform task execution capabilities in educational settings have garnered significant attention. However, existing benchmark frameworks still exhibit notable deficiencies in supporting cross-platform tasks in educational contexts, especially when dealing with school-specific software (such as XiaoYa Intelligent Assistant, HuaShi XiaZi, etc.), where the efficiency of agents often significantly decreases due to a lack of understanding of the structural specifics of these private-domain software. Additionally, current evaluation methods heavily rely on coarse-grained metrics like goal orientation or trajectory matching, making it challenging to capture the detailed execution and efficiency of agents in complex tasks. To address these issues, we propose KGCE (Knowledge-Augmented Dual-Graph…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Intelligent Tutoring Systems and Adaptive Learning · Topic Modeling
