Integration of cognitive tasks into artificial general intelligence test   for large models

Youzhi Qu; Chen Wei; Penghui Du; Wenxin Che; Chi Zhang; Wanli Ouyang,; Yatao Bian; Feiyang Xu; Bin Hu; Kai Du; Haiyan Wu; Jia Liu; Quanying Liu

arXiv:2402.02547·cs.AI·March 7, 2024·1 cites

Integration of cognitive tasks into artificial general intelligence test for large models

Youzhi Qu, Chen Wei, Penghui Du, Wenxin Che, Chi Zhang, Wanli Ouyang,, Yatao Bian, Feiyang Xu, Bin Hu, Kai Du, Haiyan Wu, Jia Liu, Quanying Liu

PDF

Open Access

TL;DR

This paper proposes a comprehensive, cognitive science-inspired testing framework for large models to evaluate their multidimensional intelligence, aiming to improve assessment accuracy and guide targeted enhancements.

Contribution

It introduces a novel AGI testing framework encompassing multiple intelligence facets, integrating human-like cognitive tests into an immersive virtual environment.

Findings

01

A battery of cognitive tests from human intelligence assessments is adapted for large models.

02

The framework emphasizes increasing test complexity with model advancements.

03

Interpreting test results is crucial to avoid false positives and negatives.

Abstract

During the evolution of large models, performance evaluation is necessarily performed to assess their capabilities and ensure safety before practical application. However, current model evaluations mainly rely on specific tasks and datasets, lacking a united framework for assessing the multidimensional intelligence of large models. In this perspective, we advocate for a comprehensive framework of cognitive science-inspired artificial general intelligence (AGI) tests, aimed at fulfilling the testing needs of large models with enhanced capabilities. The cognitive science-inspired AGI tests encompass the full spectrum of intelligence facets, including crystallized intelligence, fluid intelligence, social intelligence, and embodied intelligence. To assess the multidimensional intelligence of large models, the AGI tests consist of a battery of well-designed cognitive tests adopted from human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI-based Problem Solving and Planning