How Do Decoder-Only LLMs Perceive Users? Rethinking Attention Masking for User Representation Learning

Jiahao Yuan; Yike Xu; Jinyong Wen; Baokun Wang; Yang Chen; Xiaotong Lin; Wuliang Huang; Ziyi Gao; Xing Fu; Yu Cheng; Weiqiang Wang

arXiv:2602.10622·cs.CL·February 12, 2026

How Do Decoder-Only LLMs Perceive Users? Rethinking Attention Masking for User Representation Learning

Jiahao Yuan, Yike Xu, Jinyong Wen, Baokun Wang, Yang Chen, Xiaotong Lin, Wuliang Huang, Ziyi Gao, Xing Fu, Yu Cheng, Weiqiang Wang

PDF

Open Access

TL;DR

This paper systematically studies how different attention masking strategies affect user representation learning in decoder-only LLMs, proposing a gradient-guided soft masking method to improve training stability and embedding quality.

Contribution

It introduces Gradient-Guided Soft Masking for better transition from causal to bidirectional attention in large-scale user modeling.

Findings

01

Gradient-Guided Soft Masking improves training stability.

02

Bidirectional attention yields higher-quality user embeddings.

03

Method outperforms baseline masking strategies on multiple benchmarks.

Abstract

Decoder-only large language models are increasingly used as behavioral encoders for user representation learning, yet the impact of attention masking on the quality of user embeddings remains underexplored. In this work, we conduct a systematic study of causal, hybrid, and bidirectional attention masks within a unified contrastive learning framework trained on large-scale real-world Alipay data that integrates long-horizon heterogeneous user behaviors. To improve training dynamics when transitioning from causal to bidirectional attention, we propose Gradient-Guided Soft Masking, a gradient-based pre-warmup applied before a linear scheduler that gradually opens future attention during optimization. Evaluated on 9 industrial user cognition benchmarks covering prediction, preference, and marketing sensitivity tasks, our approach consistently yields more stable training and higher-quality…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Graph Neural Networks