PRISM: Probability Reallocation with In-Span Masking for Knowledge-Sensitive Alignment
Chenning Xu, Mao Zheng, Mingyang Song

TL;DR
PRISM is a novel training framework that reduces hallucinations in language models by penalizing overconfident predictions at fact-critical positions using structured risk signals.
Contribution
It introduces a differentiable risk-gated probability reallocation method that enhances factual accuracy without compromising overall model performance.
Findings
PRISM improves factual accuracy on hallucination-sensitive benchmarks.
The auxiliary risk signals are most effective when used conservatively.
Knowledge masking and model-aware reallocation complement each other in balancing factual correction.
Abstract
Supervised fine-tuning (SFT) with token-level hard labels can amplify overconfident imitation of factually unsupported targets, causing hallucinations that propagate in multi-sentence generation. We study an augmented SFT setting in which training instances include coarse sentence-level factuality risk labels and inter-sentence dependency annotations, providing structured signals about where factual commitments are weakly supported. We propose \textbf{PRISM}, a differentiable risk-gated framework that modifies learning only at fact-critical positions. PRISM augments standard SFT with a lightweight, model-aware probability reallocation objective that penalizes high-confidence predictions on risky target tokens, with its scope controlled by span-level risk weights and model-aware gating. Experiments on hallucination-sensitive factual benchmarks and general evaluations show that PRISM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
