AURA: Affordance-Understanding and Risk-aware Alignment Technique for Large Language Models

Sayantan Adak; Pratyush Chatterjee; Somnath Banerjee; Rima Hazra; Somak Aditya; Animesh Mukherjee

arXiv:2508.06124·cs.CL·August 11, 2025

AURA: Affordance-Understanding and Risk-aware Alignment Technique for Large Language Models

Sayantan Adak, Pratyush Chatterjee, Somnath Banerjee, Rima Hazra, Somak Aditya, Animesh Mukherjee

PDF

Open Access 1 Video

TL;DR

AURA introduces a multi-layered framework with Process Reward Models to enhance safety and logical coherence in large language models, proactively guiding them away from harmful reasoning paths.

Contribution

It presents a novel multi-layered safety framework using Process Reward Models for detailed, proactive safety assessment and alignment in large language models.

Findings

01

Significantly improves safety and logical coherence of LLM outputs

02

Outperforms existing safety and alignment methods

03

Demonstrates effectiveness across diverse reasoning tasks

Abstract

Present day LLMs face the challenge of managing affordance-based safety risks-situations where outputs inadvertently facilitate harmful actions due to overlooked logical implications. Traditional safety solutions, such as scalar outcome-based reward models, parameter tuning, or heuristic decoding strategies, lack the granularity and proactive nature needed to reliably detect and intervene during subtle yet crucial reasoning steps. Addressing this fundamental gap, we introduce AURA, an innovative, multi-layered framework centered around Process Reward Models (PRMs), providing comprehensive, step level evaluations across logical coherence and safety-awareness. Our framework seamlessly combines introspective self-critique, fine-grained PRM assessments, and adaptive safety-aware decoding to dynamically and proactively guide models toward safer reasoning trajectories. Empirical evidence…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

AURA: Affordance-Understanding and Risk-aware Alignment Technique for Large Language Models· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification