Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation
Nachiket Kotalwar, Alkis Gotovos, Adish Singla

TL;DR
This paper benchmarks in-browser capable language models for programming feedback, evaluating quality, cost, time, and privacy, and introduces fine-tuning methods to improve small models' performance in educational settings.
Contribution
It introduces a benchmarking framework for in-browser language models in programming education and develops a fine-tuning pipeline to enhance small models' feedback quality.
Findings
Fine-tuned Llama3-8B and Phi3-3.8B models perform well in in-browser inference.
In-browser models offer advantages in cost and data privacy.
Fine-tuning improves feedback quality of small models.
Abstract
Generative AI and large language models hold great promise in enhancing programming education by generating individualized feedback and hints for learners. Recent works have primarily focused on improving the quality of generated feedback to achieve human tutors' quality. While quality is an important performance criterion, it is not the only criterion to optimize for real-world educational deployments. In this paper, we benchmark language models for programming feedback generation across several performance criteria, including quality, cost, time, and data privacy. The key idea is to leverage recent advances in the new paradigm of in-browser inference that allow running these models directly in the browser, thereby providing direct benefits across cost and data privacy. To boost the feedback quality of small models compatible with in-browser inference engines, we develop a fine-tuning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDistributed and Parallel Computing Systems
MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Multi-Head Attention · Position-Wise Feed-Forward Layer
