Memory-Augmented Query Intent Understanding for Efficient Chat-based Image Retrieval

Xianke Chen; Daizong Liu; Yushuo Lou; Xin Tan; Xun Yang; Shuhui Wang; Xun Wang,Jianfeng Dong

arXiv:2605.17365·cs.CV·May 19, 2026

Memory-Augmented Query Intent Understanding for Efficient Chat-based Image Retrieval

Xianke Chen, Daizong Liu, Yushuo Lou, Xin Tan, Xun Yang, Shuhui Wang, Xun Wang,Jianfeng Dong

PDF

1 Repo

TL;DR

This paper introduces MAQIU, a memory-augmented framework for chat-based image retrieval that efficiently updates user intent across dialogue rounds, improving accuracy and reducing computational costs.

Contribution

It proposes a novel memory-based user intent updating framework with a lightweight memorization module and visual guidance integration, enhancing chat-based image retrieval performance.

Findings

01

MAQIU achieves substantial performance improvements over baselines.

02

It reduces dialogue encoding FLOPs by 86.4%.

03

The framework maintains high computational efficiency.

Abstract

Different from traditional text-to-image retrieval tasks, chat-based image retrieval allows the human-interactive system to iteratively clarify and refine user intent through multi-round dialogue, thereby achieving more fine-grained retrieval results. The key challenge in this task lies in dynamically understanding and updating the user's query intent across dialogue rounds. Although existing works have achieved great performance on this new task, they simply handle history query information either by directly concatenating all previous queries into a long textual sequence or by relying on large language models to reconstruct the current query from history. Such strategies are computationally redundant and easily lead to inconsistent intent representations as the dialogue progresses. To alleviate these issues, this paper proposes a novel and efficient memory-based user intent updating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HuiGuanLab/MAQIU
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.