Task Me Anything

Jieyu Zhang; Weikai Huang; Zixian Ma; Oscar Michel; Dong He; Tanmay; Gupta; Wei-Chiu Ma; Ali Farhadi; Aniruddha Kembhavi; Ranjay Krishna

arXiv:2406.11775·cs.CV·January 28, 2025

Task Me Anything

Jieyu Zhang, Weikai Huang, Zixian Ma, Oscar Michel, Dong He, Tanmay, Gupta, Wei-Chiu Ma, Ali Farhadi, Aniruddha Kembhavi, Ranjay Krishna

PDF

Open Access 1 Repo 4 Datasets 1 Video

TL;DR

Task-Me-Anything is a flexible benchmark generation engine that creates tailored, large-scale multimodal evaluation datasets to assess the specific strengths and weaknesses of large multimodal language models across various tasks.

Contribution

It introduces a novel, extendable system for generating customized multimodal benchmarks, addressing the challenge of selecting appropriate evaluations for specific applications.

Findings

01

Open-source MLMs excel in object and attribute recognition.

02

Models show weaknesses in spatial and temporal understanding.

03

Larger models generally perform better, with some exceptions.

Abstract

Benchmarks for large multimodal language models (MLMs) now serve to simultaneously assess the general capabilities of models instead of evaluating for a specific capability. As a result, when a developer wants to identify which models to use for their application, they are overwhelmed by the number of benchmarks and remain uncertain about which benchmark's results are most reflective of their specific use case. This paper introduces Task-Me-Anything, a benchmark generation engine which produces a benchmark tailored to a user's needs. Task-Me-Anything maintains an extendable taxonomy of visual assets and can programmatically generate a vast number of task instances. Additionally, it algorithmically addresses user queries regarding MLM performance efficiently within a computational budget. It contains 113K images, 10K videos, 2K 3D object assets, over 365 object categories, 655…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jieyuz2/taskmeanything
noneOfficial

Datasets

Videos

Task Me Anything· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning

MethodsFocus