CoVis: A Collaborative Framework for Fine-grained Graphic Visual   Understanding

Xiaoyu Deng; Zhengjian Kang; Xintao Li; Yongzhe Zhang; Tianmin Guo

arXiv:2411.18764·cs.CV·December 2, 2024

CoVis: A Collaborative Framework for Fine-grained Graphic Visual Understanding

Xiaoyu Deng, Zhengjian Kang, Xintao Li, Yongzhe Zhang, Tianmin Guo

PDF

Open Access

TL;DR

CoVis is a collaborative framework that combines advanced segmentation and large-language-models to enhance fine-grained visual understanding and generate detailed visual analytics, improving information communication efficiency.

Contribution

It introduces a novel dual-layer segmentation network coupled with LLM-based content generation for comprehensive visual understanding.

Findings

01

Outperforms existing methods in feature extraction.

02

Generates more detailed visual descriptions.

03

Validated by experiments with 32 human participants.

Abstract

Graphic visual content helps in promoting information communication and inspiration divergence. However, the interpretation of visual content currently relies mainly on humans' personal knowledge background, thereby affecting the quality and efficiency of information acquisition and understanding. To improve the quality and efficiency of visual information transmission and avoid the limitation of the observer due to the information cocoon, we propose CoVis, a collaborative framework for fine-grained visual understanding. By designing and implementing a cascaded dual-layer segmentation network coupled with a large-language-model (LLM) based content generator, the framework extracts as much knowledge as possible from an image. Then, it generates visual analytics for images, assisting observers in comprehending imagery from a more holistic perspective. Quantitative experiments and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAugmented Reality Applications · Human Motion and Animation · Handwritten Text Recognition Techniques

MethodsVisual Analytics