Professional Certification Benchmark Dataset: The First 500 Jobs For Large Language Models
David Noever, Matt Ciolino

TL;DR
This paper introduces a benchmark dataset to evaluate large language models' professional certification skills, highlighting their potential for vocational tasks and demonstrating significant performance improvements over time.
Contribution
The study creates the first comprehensive benchmark dataset for testing LLMs on professional certifications, revealing their capabilities across various vocational domains without fine-tuning.
Findings
GPT-3 passed 39% of certifications without fine-tuning.
Turbo-GPT3.5 scored 100% on OSCP exam.
Models show potential in vocational and routine tasks.
Abstract
The research creates a professional certification survey to test large language models and evaluate their employable skills. It compares the performance of two AI models, GPT-3 and Turbo-GPT3.5, on a benchmark dataset of 1149 professional certifications, emphasizing vocational readiness rather than academic performance. GPT-3 achieved a passing score (>70% correct) in 39% of the professional certifications without fine-tuning or exam preparation. The models demonstrated qualifications in various computer-related fields, such as cloud and virtualization, business analytics, cybersecurity, network setup and repair, and data analytics. Turbo-GPT3.5 scored 100% on the valuable Offensive Security Certified Professional (OSCP) exam. The models also displayed competence in other professional domains, including nursing, licensed counseling, pharmacy, and teaching. Turbo-GPT3.5 passed the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education
Methodstravel james · 15 Ways to Contact How can i speak to someone at Delta Airlines · Multi-Head Attention · Attention Is All You Need · Test · Cosine Annealing · Linear Layer · Dropout · Byte Pair Encoding · Weight Decay
