AutoDetect: Towards a Unified Framework for Automated Weakness Detection   in Large Language Models

Jiale Cheng; Yida Lu; Xiaotao Gu; Pei Ke; Xiao Liu; Yuxiao Dong,; Hongning Wang; Jie Tang; Minlie Huang

arXiv:2406.16714·cs.CL·December 11, 2024·1 cites

AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models

Jiale Cheng, Yida Lu, Xiaotao Gu, Pei Ke, Xiao Liu, Yuxiao Dong,, Hongning Wang, Jie Tang, Minlie Huang

PDF

Open Access 1 Repo

TL;DR

AutoDetect introduces a unified, automated framework using LLM-powered agents to systematically identify and address weaknesses in large language models, leading to improved performance and more targeted model enhancements.

Contribution

The paper presents AutoDetect, a novel framework that automates weakness detection in LLMs through collaborative agents, surpassing traditional benchmarking and manual inspection methods.

Findings

01

Achieves over 30% success rate in identifying flaws in models like ChatGPT and Claude.

02

Guides targeted improvements, resulting in over 10% performance boost on multiple benchmarks.

03

Outperforms untargeted data augmentation methods such as Self-Instruct.

Abstract

Although Large Language Models (LLMs) are becoming increasingly powerful, they still exhibit significant but subtle weaknesses, such as mistakes in instruction-following or coding tasks. As these unexpected errors could lead to severe consequences in practical deployments, it is crucial to investigate the limitations within LLMs systematically. Traditional benchmarking approaches cannot thoroughly pinpoint specific model deficiencies, while manual inspections are costly and not scalable. In this paper, we introduce a unified framework, AutoDetect, to automatically expose weaknesses in LLMs across various tasks. Inspired by the educational assessment process that measures students' learning outcomes, AutoDetect consists of three LLM-powered agents: Examiner, Questioner, and Assessor. The collaboration among these three agents is designed to realize comprehensive and in-depth weakness…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thu-coai/autodetect
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling

MethodsLLaMA