Hints-In-Browser: Benchmarking Language Models for Programming Feedback   Generation

Nachiket Kotalwar; Alkis Gotovos; Adish Singla

arXiv:2406.05053·cs.LG·March 10, 2025

Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation

Nachiket Kotalwar, Alkis Gotovos, Adish Singla

PDF

Open Access 1 Video

TL;DR

This paper benchmarks in-browser capable language models for programming feedback, evaluating quality, cost, time, and privacy, and introduces fine-tuning methods to improve small models' performance in educational settings.

Contribution

It introduces a benchmarking framework for in-browser language models in programming education and develops a fine-tuning pipeline to enhance small models' feedback quality.

Findings

01

Fine-tuned Llama3-8B and Phi3-3.8B models perform well in in-browser inference.

02

In-browser models offer advantages in cost and data privacy.

03

Fine-tuning improves feedback quality of small models.

Abstract

Generative AI and large language models hold great promise in enhancing programming education by generating individualized feedback and hints for learners. Recent works have primarily focused on improving the quality of generated feedback to achieve human tutors' quality. While quality is an important performance criterion, it is not the only criterion to optimize for real-world educational deployments. In this paper, we benchmark language models for programming feedback generation across several performance criteria, including quality, cost, time, and data privacy. The key idea is to leverage recent advances in the new paradigm of in-browser inference that allow running these models directly in the browser, thereby providing direct benefits across cost and data privacy. To boost the feedback quality of small models compatible with in-browser inference engines, we develop a fine-tuning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation· slideslive

Taxonomy

TopicsDistributed and Parallel Computing Systems

MethodsAttention Is All You Need · Softmax · Layer Normalization · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Multi-Head Attention · Position-Wise Feed-Forward Layer