Agent4POI: Agentic Context-Conditioned Affordance Reasoning for Multimodal Point-of-Interest Recommendation

Jinze Wang; Yangchen Zeng; Tiehua Zhang; Lu Zhang; Yuze Liu; Yongchao Liu; Xingjun Ma; Zhu Sun

arXiv:2605.15203·cs.IR·May 18, 2026

Agent4POI: Agentic Context-Conditioned Affordance Reasoning for Multimodal Point-of-Interest Recommendation

Jinze Wang, Yangchen Zeng, Tiehua Zhang, Lu Zhang, Yuze Liu, Yongchao Liu, Xingjun Ma, Zhu Sun

PDF

TL;DR

Agent4POI is a novel POI recommendation framework that dynamically generates context-aware multimodal representations at inference time, enabling better reasoning about POI affordances based on situational context.

Contribution

It introduces a new inference-time, context-conditioned multimodal representation method for POI recommendation, grounded in affordance theory and leveraging large language models.

Findings

01

Achieves 23.2% relative gain over baselines on three benchmarks.

02

Outperforms content-based methods by up to 2.4x in cold-start scenarios.

03

Degrades only 7.5% under context-shift, outperforming strongest baselines.

Abstract

We introduce Agent4POI, the first POI recommendation framework that generates context-conditioned multimodal representations at recommendation time, rather than relying on static POI embeddings pre-computed independently of context. Existing multimodal systems encode each POI once as a static embedding, a design that precludes reasoning about why the same cafe affords solo work on Monday but group celebration on Friday evening. We formally prove that no pre-computed encoder can satisfy context-sensitive ranking under standard bilinear scoring, motivating inference-time item-side representation. Agent4POI inverts this computation: given a situational context, a four-phase LLM agent generates dynamic, context-specific affordance queries (Phase 1) and executes a five-step cross-modal chain-of-thought over image, review, and metadata evidence (Phase 2). The resulting uncertainty-aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.