Reflect: Transparent Principle-Guided Reasoning for Constitutional Alignment at Scale

Henry Bell; Caroline Zhang; Mohammed Mobasserul Haque; Dhaval Potdar; Samia Zaman; Brandon Fain

arXiv:2601.18730·cs.CL·January 27, 2026

Reflect: Transparent Principle-Guided Reasoning for Constitutional Alignment at Scale

Henry Bell, Caroline Zhang, Mohammed Mobasserul Haque, Dhaval Potdar, Samia Zaman, Brandon Fain

PDF

Open Access

TL;DR

Reflect is an inference-time framework that aligns large language models with complex principles through in-context reasoning and self-evaluation, improving safety and robustness without additional training.

Contribution

It introduces a plug-and-play, inference-only method for constitutional alignment that outperforms standard prompting and enhances transparency and safety.

Findings

01

Significantly improves model conformance to diverse principles

02

Reduces rare violations, enhancing safety and robustness

03

Generates useful data for further fine-tuning

Abstract

The constitutional framework of alignment aims to align large language models (LLMs) with value-laden principles written in natural language (such as to avoid using biased language). Prior work has focused on parameter fine-tuning techniques, such as reinforcement learning from human feedback (RLHF), to instill these principles. However, these approaches are computationally demanding, require careful engineering and tuning, and often require difficult-to-obtain human annotation data. We propose \textsc{reflect}, an inference-time framework for constitutional alignment that does not require any training or data, providing a plug-and-play approach for aligning an instruction-tuned model to a set of principles. \textsc{reflect} operates entirely in-context, combining a (i) constitution-conditioned base response with post-generation (ii) self-evaluation, (iii)(a) self-critique, and (iii)(b)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Artificial Intelligence in Law · Topic Modeling