PERSA: Reinforcement Learning for Professor-Style Personalized Feedback with LLMs

Ravi Ranjan; Utkarsh Grover; Xiaomin Lin; Agoritsa Polyzou

arXiv:2605.01123·cs.AI·May 5, 2026

PERSA: Reinforcement Learning for Professor-Style Personalized Feedback with LLMs

Ravi Ranjan, Utkarsh Grover, Xiaomin Lin, Agoritsa Polyzou

PDF

TL;DR

PERSA is a reinforcement learning pipeline that fine-tunes large language models to generate educational feedback matching a professor's style without losing core content accuracy.

Contribution

It introduces a style-constrained RLHF method that updates only specific transformer components for personalized, instructor-like feedback generation.

Findings

01

PERSA achieves 96.2% style alignment score on APPS.

02

It maintains up to 100% correctness accuracy.

03

Outperforms baseline models in style transfer while preserving fidelity.

Abstract

Large language models (LLMs) can provide automated feedback in educational settings, but aligning an LLMs style with a specific instructors tone while maintaining diagnostic correctness remains challenging. We ask how can we update an LLM for automated feedback generation to align with a target instructors style without sacrificing core knowledge? We study how Reinforcement Learning from Human Feedback (RLHF) can adapt a transformer-based LLM to generate programming feedback that matches a professors grading voice. We introduce PERSA, an RLHF pipeline that combines supervised fine-tuning on professor demonstrations, reward modeling from pairwise preferences, and Proximal Policy Optimization (PPO), while deliberately constraining learning to style-bearing components. Motivated by analyses of transformer internals, PERSA applies parameter efficient fine-tuning. It updates only the top…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.