HERM: Benchmarking and Enhancing Multimodal LLMs for Human-Centric Understanding
Keliang Li, Zaifei Yang, Jiahe Zhao, Hongze Shen, Ruibing Hou, Hong, Chang, Shiguang Shan, Xilin Chen

TL;DR
This paper introduces HERM-Bench, a new benchmark and dataset for evaluating and improving multimodal large language models' ability to understand human-centric scenarios, leading to a new model HERM-7B that outperforms existing models.
Contribution
The paper presents a novel benchmark, HERM-Bench, and a comprehensive dataset, HERM-100K, to enhance training and evaluation of MLLMs for human-centric understanding, along with a new model HERM-7B.
Findings
HERM-7B outperforms existing MLLMs on human-centric tasks.
Existing MLLMs have limitations in understanding complex human-centric scenarios.
Specialized datasets improve MLLMs' human-centric understanding.
Abstract
The significant advancements in visual understanding and instruction following from Multimodal Large Language Models (MLLMs) have opened up more possibilities for broader applications in diverse and universal human-centric scenarios. However, existing image-text data may not support the precise modality alignment and integration of multi-grained information, which is crucial for human-centric visual understanding. In this paper, we introduce HERM-Bench, a benchmark for evaluating the human-centric understanding capabilities of MLLMs. Our work reveals the limitations of existing MLLMs in understanding complex human-centric scenarios. To address these challenges, we present HERM-100K, a comprehensive dataset with multi-level human-centric annotations, aimed at enhancing MLLMs' training. Furthermore, we develop HERM-7B, a MLLM that leverages enhanced training data from HERM-100K.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
