Reflect: Transparent Principle-Guided Reasoning for Constitutional Alignment at Scale
Henry Bell, Caroline Zhang, Mohammed Mobasserul Haque, Dhaval Potdar, Samia Zaman, Brandon Fain

TL;DR
Reflect is an inference-time framework that aligns large language models with complex principles through in-context reasoning and self-evaluation, improving safety and robustness without additional training.
Contribution
It introduces a plug-and-play, inference-only method for constitutional alignment that outperforms standard prompting and enhances transparency and safety.
Findings
Significantly improves model conformance to diverse principles
Reduces rare violations, enhancing safety and robustness
Generates useful data for further fine-tuning
Abstract
The constitutional framework of alignment aims to align large language models (LLMs) with value-laden principles written in natural language (such as to avoid using biased language). Prior work has focused on parameter fine-tuning techniques, such as reinforcement learning from human feedback (RLHF), to instill these principles. However, these approaches are computationally demanding, require careful engineering and tuning, and often require difficult-to-obtain human annotation data. We propose \textsc{reflect}, an inference-time framework for constitutional alignment that does not require any training or data, providing a plug-and-play approach for aligning an instruction-tuned model to a set of principles. \textsc{reflect} operates entirely in-context, combining a (i) constitution-conditioned base response with post-generation (ii) self-evaluation, (iii)(a) self-critique, and (iii)(b)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Artificial Intelligence in Law · Topic Modeling
