The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Weiyun Wang, Yiming Ren, Haowen Luo, Tiantong Li, Chenxiang Yan, Zhe, Chen, Wenhai Wang, Qingyun Li, Lewei Lu, Xizhou Zhu, Yu Qiao, Jifeng Dai

TL;DR
The paper introduces ASMv2, a model and dataset for comprehensive relation understanding in images, improving object and relation recognition and reducing hallucinations in multi-modal models, with new benchmarks and datasets.
Contribution
We propose ASMv2, a unified model for relation comprehension, and introduce the first high-quality ReC dataset and a new evaluation benchmark for relation understanding in MLLMs.
Findings
ASMv2 achieves 52.04% accuracy on CRPE benchmark.
Our dataset aligns with instruction tuning data for training.
ASMv2 outperforms previous models like LLaVA-1.5 significantly.
Abstract
We present the All-Seeing Project V2: a new model and dataset designed for understanding object relations in images. Specifically, we propose the All-Seeing Model V2 (ASMv2) that integrates the formulation of text generation, object localization, and relation comprehension into a relation conversation (ReC) task. Leveraging this unified task, our model excels not only in perceiving and recognizing all objects within the image but also in grasping the intricate relation graph between them, diminishing the relation hallucination often encountered by Multi-modal Large Language Models (MLLMs). To facilitate training and evaluation of MLLMs in relation understanding, we created the first high-quality ReC dataset ({AS-V2) which is aligned with the format of standard instruction tuning data. In addition, we design a new benchmark, termed Circular-based Relation Probing Evaluation (CRPE) for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Graph Neural Networks
