NeuralNexus at BEA 2025 Shared Task: Retrieval-Augmented Prompting for Mistake Identification in AI Tutors
Numaan Naeem, Sarfraz Ahmad, Momina Ahsan, Hasan Iqbal

TL;DR
This paper introduces a retrieval-augmented prompting system using GPT-4 for mistake identification in AI tutors, combining multiple models and retrieval techniques to improve pedagogical feedback accuracy.
Contribution
It presents a novel retrieval-augmented prompting approach that enhances mistake detection in AI tutoring systems, outperforming baseline methods.
Findings
Retrieval-augmented prompting improves mistake identification accuracy.
Combining multiple models yields better performance than individual approaches.
The system provides interpretable, schema-guided predictions.
Abstract
This paper presents our system for Track 1: Mistake Identification in the BEA 2025 Shared Task on Pedagogical Ability Assessment of AI-powered Tutors. The task involves evaluating whether a tutor's response correctly identifies a mistake in a student's mathematical reasoning. We explore four approaches: (1) an ensemble of machine learning models over pooled token embeddings from multiple pretrained language models (LMs); (2) a frozen sentence-transformer using [CLS] embeddings with an MLP classifier; (3) a history-aware model with multi-head attention between token-level history and response embeddings; and (4) a retrieval-augmented few-shot prompting system with a large language model (LLM) i.e. GPT 4o. Our final system retrieves semantically similar examples, constructs structured prompts, and uses schema-guided output parsing to produce interpretable predictions. It outperforms all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling · Model Reduction and Neural Networks
MethodsCosine Annealing · Layer Normalization · Linear Warmup With Cosine Annealing · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Discriminative Fine-Tuning · Byte Pair Encoding · Softmax · Linear Layer · Dropout
