Integrating Vision-Centric Text Understanding for Conversational Recommender Systems

Wei Yuan; Shutong Qiao; Tong Chen; Quoc Viet Hung Nguyen; Zi Huang; Hongzhi Yin

arXiv:2601.13505·cs.IR·January 21, 2026

Integrating Vision-Centric Text Understanding for Conversational Recommender Systems

Wei Yuan, Shutong Qiao, Tong Chen, Quoc Viet Hung Nguyen, Zi Huang, Hongzhi Yin

PDF

Open Access

TL;DR

This paper introduces STARCRS, a novel conversational recommender system that combines screen-reading and language model pathways for better understanding of complex multi-turn dialogues, leading to improved recommendations and responses.

Contribution

The paper proposes a dual-mode text understanding framework with a knowledge-anchored fusion method for enhanced preference inference in CRSs.

Findings

01

STARCRS outperforms existing models on benchmark datasets.

02

It improves both recommendation accuracy and response quality.

03

The fusion framework effectively combines visual and textual information.

Abstract

Conversational Recommender Systems (CRSs) have attracted growing attention for their ability to deliver personalized recommendations through natural language interactions. To more accurately infer user preferences from multi-turn conversations, recent works increasingly expand conversational context (e.g., by incorporating diverse entity information or retrieving related dialogues). While such context enrichment can assist preference modeling, it also introduces longer and more heterogeneous inputs, leading to practical issues such as input length constraints, text style inconsistency, and irrelevant textual noise, thereby raising the demand for stronger language understanding ability. In this paper, we propose STARCRS, a Screen-Text-AwaRe Conversational Recommender System that integrates two complementary text understanding modes: (1) a screen-reading pathway that encodes auxiliary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Recommender Systems and Techniques