ReGraP-LLaVA: Reasoning enabled Graph-based Personalized Large Language and Vision Assistant

Yifan Xiang; Zhenxi Zhang; Bin Li; Yixuan Weng; Shoujun Zhou; Yangfan He; Keqin Li

arXiv:2505.03654·cs.CV·May 20, 2025

ReGraP-LLaVA: Reasoning enabled Graph-based Personalized Large Language and Vision Assistant

Yifan Xiang, Zhenxi Zhang, Bin Li, Yixuan Weng, Shoujun Zhou, Yangfan He, Keqin Li

PDF

Open Access 1 Repo 3 Reviews

TL;DR

ReGraP-LLaVA introduces a novel dataset and model for personalized multi-modal reasoning, enabling structured relational understanding among personalized concepts, and achieves state-of-the-art performance on diverse reasoning tasks.

Contribution

The paper presents ReGraP, a new dataset with structured reasoning pathways, and ReGraP-LLaVA, a model trained on this data to enhance personalized relational reasoning in multimodal tasks.

Findings

01

ReGraP-LLaVA outperforms existing models on the ReGraP benchmark.

02

The dataset enables training models to reason over relations among personalized concepts.

03

Graph prompting methods improve the alignment of knowledge graphs within the model.

Abstract

Recent advances in personalized MLLMs enable effective capture of user-specific concepts, supporting both recognition of personalized concepts and contextual captioning. However, humans typically explore and reason over relations among objects and individuals, transcending surface-level information to achieve more personalized and contextual understanding. To this end, existing methods may face three main limitations: Their training data lacks multi-object sets in which relations among objects are learnable. Building on the limited training data, their models overlook the relations between different personalized concepts and fail to reason over them. Their experiments mainly focus on a single personalized concept, where evaluations are limited to recognition and captioning tasks. To address the limitations, we present a new dataset named ReGraP, consisting of 120 sets of personalized…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 4

Strengths

1. The motivation for this work is clear and compelling. The authors correctly identify that existing personalized MLLMs focus primarily on concept recognition and captioning, while neglecting the relational knowledge and reasoning capabilities that humans naturally employ when understanding personalized contexts. 2. The novelty of the approach is strong. To my knowledge, this is the first work to explicitly construct knowledge graphs for personalized concepts and use them to train MLLMs with r

Weaknesses

1. The evaluation setup and dataset descriptions are somewhat unclear throughout the paper. While the datasets are described in the main text (Section 5), the tables themselves do not clearly indicate which dataset is being evaluated. For instance, Table 2 does not specify that it evaluates on the ReGraP dataset, while Table 3 evaluates on Yo'LLaVA and MyVLM datasets with different tasks. The authors should add explicit dataset identifiers to table captions and within the tables themselves to im

Reviewer 02Rating 6Confidence 4

Strengths

1. It's a crucial problem to enable MLLMs to perform relational reasoning over multiple personalized concepts. 2. The proposed framework based on soft and/or hard graph prompting is well-designed to enhance the relational reasoning capabilities of MLLMs. 3. The paper develops a data generation pipeline for relational question answering synthesis, and also introduces a new dataset and benchmark named ReGraP, which are valuable resources for future research in this area. 4. The paper is well-writt

Weaknesses

1. The paper extends the idea of soft/hard prompting beyond previous works (e.g., Yo'LLaVA) by integrating reasoning over knowledge graphs. However, since prompting-based personalization has been explored before, the novelty mainly lies in using structured graph representations and CoT QA data, which could be better emphasized. 2. The paper lacks comparison with several related personalization methods such as RAP-LLaVA, UniCTokens and RePIC. Including or discussing these baselines would strength

Reviewer 03Rating 2Confidence 5

Strengths

This paper identifies an evaluation gap in relational reasoning for personalized MLLM-based understanding. To address this limitation, the authors incorporate both knowledge graphs (KGs) and chain-of-thought (CoT) reasoning into multi-object personalized MLLMs. They further propose a data generation pipeline to construct a new benchmark dataset, ReGraP, supporting the evaluation of such personalized relational reasoning abilities.

Weaknesses

1. **Limited scope and representativeness of the proposed dataset**. The diversity of concepts, relations, and scenarios covered in ReGraP remains narrow. Most scenes revolve around anime characters and personal items, resulting in a limited semantic scope. While such content may be common in personalization research, the benchmark lacks a clear definition or demonstration of “personalization.” In addition, the relational types are shallow. Most attribute or role associations, such as “who is th

Code & Models

Repositories

xyfyyds/regrap
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Graph Neural Networks · Topic Modeling

MethodsSparse Evolutionary Training · Focus · ALIGN