HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation
Shaina Raza, Aravind Narayanan, Vahid Reza Khazaie, Ashmal Vayani, Ahmed Y. Radwan, Mukund S. Chettiar, Amandeep Singh, Mubarak Shah, Deval Pandya

TL;DR
HumaniBench is a comprehensive evaluation framework that assesses large multimodal models on human-centered principles like fairness, ethics, and empathy across diverse real-world visual tasks, revealing persistent gaps and trade-offs.
Contribution
This work introduces HumaniBench, a unified, large-scale evaluation framework for systematically measuring human-centric alignment in multimodal models.
Findings
Proprietary models excel in ethics, reasoning, and empathy.
Open-source models perform better in visual grounding and resilience.
All models show gaps in fairness and multilingual inclusivity.
Abstract
Although recent large multimodal models (LMMs) demonstrate impressive progress on vision language tasks, their alignment with human centered (HC) principles, such as fairness, ethics, inclusivity, empathy, and robustness; remains poorly understood. We present HumaniBench, a unified evaluation framework designed to characterize HC alignment across realistic, socially grounded visual contexts. HumaniBench contains 32,000 expert-verified image question pairs derived from real world news imagery and spanning seven evaluation tasks: scene understanding, instance identity, multiple-choice visual question answering (VQA), multilinguality, visual grounding, empathetic captioning, and image resilience testing. Each task is mapped to one or more HC principles through a principled operationalization of metrics covering accuracy, harmful content detection, hallucination and faithfulness, coherence,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis
