HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation

Shaina Raza; Aravind Narayanan; Vahid Reza Khazaie; Ashmal Vayani; Ahmed Y. Radwan; Mukund S. Chettiar; Amandeep Singh; Mubarak Shah; Deval Pandya

arXiv:2505.11454·cs.CV·December 1, 2025

HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation

Shaina Raza, Aravind Narayanan, Vahid Reza Khazaie, Ashmal Vayani, Ahmed Y. Radwan, Mukund S. Chettiar, Amandeep Singh, Mubarak Shah, Deval Pandya

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

HumaniBench is a comprehensive evaluation framework that assesses large multimodal models on human-centered principles like fairness, ethics, and empathy across diverse real-world visual tasks, revealing persistent gaps and trade-offs.

Contribution

This work introduces HumaniBench, a unified, large-scale evaluation framework for systematically measuring human-centric alignment in multimodal models.

Findings

01

Proprietary models excel in ethics, reasoning, and empathy.

02

Open-source models perform better in visual grounding and resilience.

03

All models show gaps in fairness and multilingual inclusivity.

Abstract

Although recent large multimodal models (LMMs) demonstrate impressive progress on vision language tasks, their alignment with human centered (HC) principles, such as fairness, ethics, inclusivity, empathy, and robustness; remains poorly understood. We present HumaniBench, a unified evaluation framework designed to characterize HC alignment across realistic, socially grounded visual contexts. HumaniBench contains 32,000 expert-verified image question pairs derived from real world news imagery and spanning seven evaluation tasks: scene understanding, instance identity, multiple-choice visual question answering (VQA), multilinguality, visual grounding, empathetic captioning, and image resilience testing. Each task is mapped to one or more HC principles through a principled operationalization of metrics covering accuracy, harmful content detection, hallucination and faithfulness, coherence,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vectorinstitute/humanibench
pytorchOfficial

Models

🤗
vector-institute/Factuality-Alignment-Qwen2.5-14B
model· 4 dl
4 dl

Datasets

vector-institute/HumaniBench
dataset· 59 dl
59 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis