CodeGuard: Improving LLM Guardrails in CS Education

Nishat Raihan; Noah Erdachew; Jayoti Devi; Joanna C. S. Santos; Marcos Zampieri

arXiv:2602.02509·cs.CY·February 4, 2026

CodeGuard: Improving LLM Guardrails in CS Education

Nishat Raihan, Noah Erdachew, Jayoti Devi, Joanna C. S. Santos, Marcos Zampieri

PDF

Open Access 1 Video

TL;DR

This paper introduces CodeGuard, a comprehensive framework to improve the safety of LLMs in CS education by classifying prompts, creating a large dataset, and developing a real-time detection model, significantly reducing harmful outputs.

Contribution

We propose CodeGuard, including a new taxonomy, a large prompt dataset, and PromptShield, a real-time unsafe prompt detection model, advancing LLM safety in educational settings.

Findings

01

PromptShield achieves 0.93 F1 score in detecting unsafe prompts.

02

CodeGuard reduces harmful code completions by 30-65%.

03

Framework improves safety without harming educational performance.

Abstract

Large language models (LLMs) are increasingly embedded in Computer Science (CS) classrooms to automate code generation, feedback, and assessment. However, their susceptibility to adversarial or ill-intentioned prompts threatens student learning and academic integrity. To cope with this important issue, we evaluate existing off-the-shelf LLMs in handling unsafe and irrelevant prompts within the domain of CS education. We identify important shortcomings in existing LLM guardrails which motivates us to propose CodeGuard, a comprehensive guardrail framework for educational AI systems. CodeGuard includes (i) a first-of-its-kind taxonomy for classifying prompts; (ii) the CodeGuard dataset, a collection of 8,000 prompts spanning the taxonomy; and (iii) PromptShield, a lightweight sentence-encoder model fine-tuned to detect unsafe prompts in real time. Experiments show that PromptShield…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

CodeGuard: Improving LLM Guardrails in CS Education· underline

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Teaching and Learning Programming · Academic integrity and plagiarism