Detailed Object Description with Controllable Dimensions

Xinran Wang; Haiwen Zhang; Baoteng Li; Kongming Liang; Hao Sun,; Zhongjiang He; Zhanyu Ma; Jun Guo

arXiv:2411.19106·cs.CV·January 9, 2025

Detailed Object Description with Controllable Dimensions

Xinran Wang, Haiwen Zhang, Baoteng Li, Kongming Liang, Hao Sun,, Zhongjiang He, Zhanyu Ma, Jun Guo

PDF

Open Access 1 Repo

TL;DR

This paper introduces Dimension Tailor, a training-free pipeline that refines object descriptions by focusing on user-specified dimensions, improving relevance and detail control in multimodal large language models for visually impaired assistance.

Contribution

The paper presents a novel, training-free method for refining object descriptions to emphasize user-specified dimensions, enhancing controllability and relevance in multimodal models.

Findings

01

Dimension Tailor improves description quality for user-specified details.

02

The pipeline enhances the performance of recent multimodal large language models.

03

It offers flexible inclusion or exclusion of object dimensions based on user needs.

Abstract

Object description plays an important role for visually impaired individuals to understand and compare the differences between objects. Recent multimodal large language models(MLLMs) exhibit powerful perceptual abilities and demonstrate impressive potential for generating object-centric descriptions. However, the descriptions generated by such models may still usually contain a lot of content that is not relevant to the user intent or miss some important object dimension details. Under special scenarios, users may only need the details of certain dimensions of an object. In this paper, we propose a training-free object description refinement pipeline, Dimension Tailor, designed to enhance user-specified details in object descriptions. This pipeline includes three steps: dimension extracting, erasing, and supplementing, which decompose the description into user-specified dimensions.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xin-ran-w/controllableobjectdescription
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Constraint Satisfaction and Optimization · Image Processing and 3D Reconstruction