Where Not to Learn: Prior-Aligned Training with Subset-based Attribution Constraints for Reliable Decision-Making

Ruoyu Chen; Shangquan Sun; Xiaoqing Guo; Sanyi Zhang; Kangwei Liu; Shiming Liu; Zhangcheng Wang; Qunli Zhang; Hua Zhang; Xiaochun Cao

arXiv:2602.07008·cs.CV·May 20, 2026

Where Not to Learn: Prior-Aligned Training with Subset-based Attribution Constraints for Reliable Decision-Making

Ruoyu Chen, Shangquan Sun, Xiaoqing Guo, Sanyi Zhang, Kangwei Liu, Shiming Liu, Zhangcheng Wang, Qunli Zhang, Hua Zhang, Xiaochun Cao

PDF

TL;DR

This paper introduces a novel attribution-based training method that aligns models with human priors by constraining reliance on input regions, improving both accuracy and decision reasonability.

Contribution

It proposes a subset-selection attribution approach to enforce human prior alignment during training, addressing divergence in learned representations.

Findings

01

Improved task accuracy across classification and generation tasks.

02

Enhanced decision reasonability and interpretability.

03

Consistent benefits observed in MLLM-based GUI agent models.

Abstract

Reliable models should not only predict correctly, but also justify decisions with acceptable evidence. Yet conventional supervised learning typically provides only class-level labels, allowing models to achieve high accuracy through shortcut correlations rather than the intended evidence. Human priors can help constrain such behavior, but aligning models to these priors remains challenging because learned representations often diverge from human perception. To address this challenge, we propose an attribution-based human prior alignment method. We encode human priors as input regions that the model is expected to rely on (e.g., bounding boxes), and leverage a highly faithful subset-selection-based attribution approach to expose the model's decision evidence during training. When the attribution region deviates substantially from the prior regions, we penalize reliance on off-prior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis