ACT as Human: Multimodal Large Language Model Data Annotation with Critical Thinking

Lequan Lin; Dai Shi; Andi Han; Feng Chen; Qiuzheng Chen; Jiawen Li; Zhaoyang Li; Jiyuan Li; Zhenbang Sun; Junbin Gao

arXiv:2511.09833·cs.LG·March 23, 2026

ACT as Human: Multimodal Large Language Model Data Annotation with Critical Thinking

Lequan Lin, Dai Shi, Andi Han, Feng Chen, Qiuzheng Chen, Jiawen Li, Zhaoyang Li, Jiyuan Li, Zhenbang Sun, Junbin Gao

PDF

Open Access

TL;DR

The paper introduces the ACT data pipeline that uses multimodal large language models for annotation and error detection, significantly reducing human effort while maintaining high data quality across multiple domains.

Contribution

It presents a novel multimodal LLM-based annotation framework with critical thinking, applicable across NLP, CV, and multimodal tasks, and provides empirical and theoretical insights for efficient high-quality data labeling.

Findings

01

Reduces human annotation costs by up to 90%.

02

Maintains less than 2% performance gap compared to fully human annotations.

03

Provides guidelines for improving annotation quality and efficiency.

Abstract

Supervised learning relies on high-quality labeled data, but obtaining such data through human annotation is both expensive and time-consuming. Recent work explores using large language models (LLMs) for annotation, but LLM-generated labels still fall short of human-level quality. To address this problem, we propose the Annotation with Critical Thinking (ACT) data pipeline, where LLMs serve not only as annotators but also as judges to critically identify potential errors. Human effort is then directed towards reviewing only the most "suspicious" cases, significantly improving the human annotation efficiency. Our major contributions are as follows: (1) ACT is applicable to a wide range of domains, including natural language processing (NLP), computer vision (CV), and multimodal understanding, by leveraging multimodal-LLMs (MLLMs). (2) Through empirical studies, we derive 7 insights on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)