Professional Certification Benchmark Dataset: The First 500 Jobs For   Large Language Models

David Noever; Matt Ciolino

arXiv:2305.05377·cs.AI·May 10, 2023·2 cites

Professional Certification Benchmark Dataset: The First 500 Jobs For Large Language Models

David Noever, Matt Ciolino

PDF

Open Access

TL;DR

This paper introduces a benchmark dataset to evaluate large language models' professional certification skills, highlighting their potential for vocational tasks and demonstrating significant performance improvements over time.

Contribution

The study creates the first comprehensive benchmark dataset for testing LLMs on professional certifications, revealing their capabilities across various vocational domains without fine-tuning.

Findings

01

GPT-3 passed 39% of certifications without fine-tuning.

02

Turbo-GPT3.5 scored 100% on OSCP exam.

03

Models show potential in vocational and routine tasks.

Abstract

The research creates a professional certification survey to test large language models and evaluate their employable skills. It compares the performance of two AI models, GPT-3 and Turbo-GPT3.5, on a benchmark dataset of 1149 professional certifications, emphasizing vocational readiness rather than academic performance. GPT-3 achieved a passing score (>70% correct) in 39% of the professional certifications without fine-tuning or exam preparation. The models demonstrated qualifications in various computer-related fields, such as cloud and virtualization, business analytics, cybersecurity, network setup and repair, and data analytics. Turbo-GPT3.5 scored 100% on the valuable Offensive Security Certified Professional (OSCP) exam. The models also displayed competence in other professional domains, including nursing, licensed counseling, pharmacy, and teaching. Turbo-GPT3.5 passed the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education

Methodstravel james · 15 Ways to Contact How can i speak to someone at Delta Airlines · Multi-Head Attention · Attention Is All You Need · Test · Cosine Annealing · Linear Layer · Dropout · Byte Pair Encoding · Weight Decay