TriAlignGR: Triangular Multitask Alignment with Multimodal Deep Interest Mining for Generative Recommendation

Yangchen Zeng; Hao Peng; Rongfeng Guo; Zhenyu Yu; Zhiyuan Hu; Jinze Wang

arXiv:2605.05249·cs.IR·May 19, 2026

TriAlignGR: Triangular Multitask Alignment with Multimodal Deep Interest Mining for Generative Recommendation

Yangchen Zeng, Hao Peng, Rongfeng Guo, Zhenyu Yu, Zhiyuan Hu, Jinze Wang

PDF

TL;DR

TriAlignGR is a novel multitask multimodal framework that enhances generative recommendation by integrating visual semantics and user interests into semantic IDs, addressing existing semantic degradation and opacity issues.

Contribution

It introduces a unified approach combining cross-modal semantic alignment, deep interest mining, and triangular multitask training to improve semantic ID quality and recommendation accuracy.

Findings

01

Addresses SID content degradation and semantic opacity.

02

Integrates visual content into SID construction.

03

Enables latent user interest extraction through multimodal reasoning.

Abstract

We introduce TriAlignGR, a unified multitask-multimodal framework for generative recommendation that establishes two-stage multimodal semantic propagation: (i) encoding visual semantics directly into SIDs via multimodal embeddings, and (ii) enabling the model to decode these semantics through visual description tasks. Existing Semantic ID (SID) pipelines suffer from two fundamental but underexplored problems: \textbf{SID Content Degradation (SCD)}, where cascaded encoding and residual quantization discard critical multimodal and interest-level semantics; and \textbf{SID Semantic Opacity (SSO)}, where models autoregressively generate SID sequences without truly comprehending their underlying meaning, leading to hallucination and poor generalization. Prior work addresses at most text-SID alignment, leaving visual semantics and latent user interests entirely unexploited. TriAlignGR…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.