Scraping and Clustering Techniques for the Characterization of Linkedin Profiles
Kais Dai, Celia G\'onzalez Nespereira, Ana Fern\'andez Vilas, Rebeca, P. D\'iaz Redondo

TL;DR
This paper presents a method for scraping and analyzing LinkedIn profiles using NLP techniques to classify educational backgrounds and cluster professional careers, providing insights into user demographics and career-education relationships.
Contribution
It introduces a novel scraping approach combined with NLP-based classification and clustering to analyze LinkedIn profiles at scale.
Findings
Identified patterns linking educational degrees to professional careers
Clustered profiles reveal distinct career trajectories
Scraping 5 million profiles demonstrates scalability and effectiveness
Abstract
The socialization of the web has undertaken a new dimension after the emergence of the Online Social Networks (OSN) concept. The fact that each Internet user becomes a potential content creator entails managing a big amount of data. This paper explores the most popular professional OSN: LinkedIn. A scraping technique was implemented to get around 5 Million public profiles. The application of natural language processing techniques (NLP) to classify the educational background and to cluster the professional background of the collected profiles led us to provide some insights about this OSN's users and to evaluate the relationships between educational degrees and professional careers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb visibility and informetrics
