Detailed Object Description with Controllable Dimensions
Xinran Wang, Haiwen Zhang, Baoteng Li, Kongming Liang, Hao Sun,, Zhongjiang He, Zhanyu Ma, Jun Guo

TL;DR
This paper introduces Dimension Tailor, a training-free pipeline that refines object descriptions by focusing on user-specified dimensions, improving relevance and detail control in multimodal large language models for visually impaired assistance.
Contribution
The paper presents a novel, training-free method for refining object descriptions to emphasize user-specified dimensions, enhancing controllability and relevance in multimodal models.
Findings
Dimension Tailor improves description quality for user-specified details.
The pipeline enhances the performance of recent multimodal large language models.
It offers flexible inclusion or exclusion of object dimensions based on user needs.
Abstract
Object description plays an important role for visually impaired individuals to understand and compare the differences between objects. Recent multimodal large language models(MLLMs) exhibit powerful perceptual abilities and demonstrate impressive potential for generating object-centric descriptions. However, the descriptions generated by such models may still usually contain a lot of content that is not relevant to the user intent or miss some important object dimension details. Under special scenarios, users may only need the details of certain dimensions of an object. In this paper, we propose a training-free object description refinement pipeline, Dimension Tailor, designed to enhance user-specified details in object descriptions. This pipeline includes three steps: dimension extracting, erasing, and supplementing, which decompose the description into user-specified dimensions.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Constraint Satisfaction and Optimization · Image Processing and 3D Reconstruction
