ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets   and Large Language Models

Ahmed Heakl; Youssef Mohamed; Noran Mohamed; Aly Elsharkawy; Ahmed; Zaky

arXiv:2406.18125·cs.CL·July 16, 2024

ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets and Large Language Models

Ahmed Heakl, Youssef Mohamed, Noran Mohamed, Aly Elsharkawy, Ahmed, Zaky

PDF

Open Access 1 Repo 2 Models 1 Datasets

TL;DR

This paper introduces a large-scale resume dataset and leverages large language models like BERT to significantly improve resume classification accuracy, addressing previous challenges of small datasets and lack of standardization.

Contribution

It presents a comprehensive large-scale resume dataset and demonstrates the effectiveness of LLMs in improving classification accuracy over traditional methods.

Findings

01

Top-1 accuracy of 92% achieved

02

Top-5 accuracy of 97.5% achieved

03

Large dataset improves model robustness

Abstract

The increasing reliance on online recruitment platforms coupled with the adoption of AI technologies has highlighted the critical need for efficient resume classification methods. However, challenges such as small datasets, lack of standardized resume templates, and privacy concerns hinder the accuracy and effectiveness of existing classification models. In this work, we address these challenges by presenting a comprehensive approach to resume classification. We curated a large-scale dataset of 13,389 resumes from diverse sources and employed Large Language Models (LLMs) such as BERT and Gemma1.1 2B for classification. Our results demonstrate significant improvements over traditional machine learning approaches, with our best model achieving a top-1 accuracy of 92\% and a top-5 accuracy of 97.5\%. These findings underscore the importance of dataset quality and advanced model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

noran-mohamed/Resume-Classification-Dataset
noneOfficial

Models

Datasets

ahmedheakl/resume-atlas
dataset· 334 dl
334 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Residual Connection · Weight Decay · Softmax · Layer Normalization · Attention Dropout · Linear Warmup With Linear Decay · Dropout · Adam