CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety
Heajun An, Qi Zhang, Vedanth Achanta, Jin-Hee Cho

TL;DR
This paper introduces CR4T, a rewrite-based framework that enhances adolescent LLM safety by transforming unsafe outputs into age-appropriate guidance, reducing refusals and improving conversational quality.
Contribution
CR4T offers a novel, model-agnostic approach that reconstructs unsafe responses into developmentally suitable guidance, addressing limitations of traditional refusal-based safety methods.
Findings
CR4T significantly reduces unsafe outputs and refusals.
Targeted rewriting preserves benign interactions and improves user experience.
Experimental results validate the effectiveness of CR4T in adolescent LLM safety.
Abstract
Large language models (LLMs) are increasingly embedded in adolescent digital environments, mediating information seeking, advice, and emotionally sensitive interactions. Yet existing safety mechanisms remain largely grounded in adult-centric norms and operationalize safety through refusal-oriented suppression. While such approaches may reduce immediate policy violations, they can also create conversational dead-ends, limit constructive guidance, and fail to address the developmental vulnerabilities inherent in adolescent-AI interactions. We argue that adolescent LLM safety should be framed not solely as a filtering problem, but as a socio-technical, developmentally aligned transformation problem. To operationalize this perspective, we propose Critique-and-Revise-for-Teenagers (CR4T), a model-agnostic safeguarding framework that selectively reconstructs unsafe or refusal-style outputs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
